Tech blog of Anton Keks

Sunday, November 27, 2011

Reusing Shotwell thumbnails in Nautilus

As I have lots of photos on my machine, thumbnails start to consume considerable amount of space on the disk.

Another problem, is that gnome-raw-thumbnailer isn't enabled in Ubuntu (Natty, Oneiric) by default anymore, so my raw photos don't get thumbnailed in Nautilus. And, if I enable it manually, thumbnails of vertical photos don't show with the correct orientation.

So, I have researched a bit the freedesktop thumbnail spec, gnome thumbnailer spec and how Shotwell stores its thumbnails and came up with a shell script that reuses Shotwell thumbnails for Nautilus.

Save the script below as /usr/bin/shotwell-raw-thumbnailer

#!/bin/bash
input=$1
output=$2

if [ -z $output ]; then
    echo "Usage: $0 input output"
    exit 1
fi

file=`echo -n ${input##file://} | perl -pe 's/%([0-9a-f]{2})/sprintf("%s", pack("H2",$1))/eig'`
md5=`echo -n $input | md5sum | awk '{print $1}'`

shotwell_id=`sqlite3 ~/.shotwell/data/photo.db "select id from PhotoTable where filename = '$file'"`
if [ -z $shotwell_id ]; then
    gnome-raw-thumbnailer $input $output
    exit
fi

thumb=`printf ~/.shotwell/thumbs/thumbs128/thumb%016x.jpg $shotwell_id`
if [ \! -e $thumb ]; then
    gnome-raw-thumbnailer $input $output
    exit
fi

replaceWithLink() {
    sleep 1
    ln -sf $thumb ~/.thumbnails/normal/$md5.png
}

# gnome-thumbnail-factory doesn't support links
cp $thumb $output

# however, linked thumbnails work, so replace them after a delay
replaceWithLink &

In order to make it work, you then need to register it as a thumbnailer in Gnome, put this to /usr/share/thumbnailers/shotwell.thumbnailer

[Thumbnailer Entry]
Exec=/usr/bin/shotwell-raw-thumbnailer %u %o
MimeType=image/x-3fr;image/x-adobe-dng;image/x-arw;image/x-bay;image/x-canon-cr2;image/x-canon-crw;image/x-cap;image/x-cr2;image/x-crw;image/x-dcr;image/x-dcraw;image/x-dcs;image/x-dng;image/x-drf;image/x-eip;image/x-erf;image/x-fff;image/x-fuji-raf;image/x-iiq;image/x-k25;image/x-kdc;image/x-mef;image/x-minolta-mrw;image/x-mos;image/x-mrw;image/x-nef;image/x-nikon-nef;image/x-nrw;image/x-olympus-orf;image/x-orf;image/x-panasonic-raw;image/x-pef;image/x-pentax-pef;image/x-ptx;image/x-pxn;image/x-r3d;image/x-raf;image/x-raw;image/x-rw2;image/x-rwl;image/x-rwz;image/x-sigma-x3f;image/x-sony-arw;image/x-sony-sr2;image/x-sony-srf;image/x-sr2;image/x-srf;image/x-x3f;

So, what does this script do?

When Gnome (or Nautilus) needs a thumbnail, it runs this script
The script checks if the image has an entry in the Shotwell database (~/.shotwell/data/photo.db)
Then it checks if Shotwell has a thumbnail for it (in ~/.shotwell/thumbs)
If yes, the script returns the already generated thumbnail to Gnome - no generation needed, so it works much faster
If Shotwell doesn't have the thumbnail, the call is delegated to gnome-raw-thumbnailer that generates a new thumbnail, the old-fashioned way
If Shotwell's thumbnail was used, the script will asynchronously replace the thumbnail in ~/.thumbnails with the link to Shotwell's file, avoiding a copy on the disk

The last step is the one that saves disk space. Unfortunately, it is not possible to return a link right away to Gnome - it can't read it for some reason. However, by putting a link directly under ~/.thumbnails later works perfectly, even if we put a .jpg file under the name of .png (as required by a spec). Png is actually a worse choice for thumbnailing of photos due to its lossless compression, so the disk savings are more than twofold with this script.

The next step would be to rewrite this in C or Vala to make even faster and maybe even make Shotwell create these links right away when it generates the thumbnails.

Tuesday, October 19, 2010

Simple Rsync GUI: easy backups from Nautilus

Most often, making backups of your important files is a manual process. Especially if you are dealing with large collections of photos.

In the meantime I have written a small and convenient Nautilus script (for Gnome users) for doing exactly that.

Features:

Syncs to any mounted location or over SSH (everything that rsync supports)
Remembers previously used locations
Preview of changes (any deletions are shown first, but performed the last)
Nice progress bar with upload speed display

Everything is written as a simple bash script using Zenity for GTK GUI - just drop it to ~/.gnome2/nautilus-scripts directory, and it appear in Nautilus right-click menu, under Scripts.

Don't forget - this all is just a frontend for rsync (that you are too lazy to run from command-line).

Dependencies: nautilus, zenity, rsync, bash

And now, here is the source (save to ~/.gnome2/nautilus-scripts/Sync):

#!/bin/bash
# Nautilus script to sync specified folder to another destination via rsync.
# Put this to ~/.gnome2/nautilus-scripts
# Written by Anton Keks (BSD license)

paths_file=$(readlink -f $0).paths
locations=`cat $paths_file`
sources=`cat $paths_file | awk -F'|' '{print $1}'`

if [ "$1" ]; then
  source=$1 
else
  # add current directory also to the list
  sources=`echo -e "$sources\\n$PWD" | sort -u`
  # ask user to chose one of the sources
  source=`zenity --list --title="Sync source" --text="No source was specified. Please choose what do you want to sync" --column=Source "$sources" Other...` || exit 1
  if [ "$source" = Other... ]; then
    source=`zenity --entry --title="Sync source" --text="Please enter the source path on local computer" --entry-text="$PWD"` || exit 1
  fi
fi

# normalize and remove trailing /
source=`readlink -f "$source"`
source=${source%/}

if [ ! -d "$source" ]; then
  zenity --error --text="$source is not a directory"; exit 2
fi

if [ $2 ]; then
  # TODO: support multiple sources
  zenity --warning --text="Only one directory can be synched, using $source"
fi

# find matching destinations from stored ones
destinations=""
for s in $sources; do
  if echo "$source" | fgrep $s; then
    dest=`fgrep "$s" $paths_file | awk -F'|' '{print $2}'`
    suffix=${source#$s}
    suffix=${suffix%/*}
    destinations="$destinations $dest$suffix" 
  fi
done

# ask user to chose one of the matching destinations of enter a new one
dest=`zenity --list --title="Sync destination" --text="Choose where to sync $source" --column=Destination $destinations New...` || exit 3
if [ $dest = New... ]; then
  basename=`basename "$source"`
  dest=`zenity --entry --title="Sync destination" --text="Please enter the destination (either local path or rsync's remote descriptor), omitting $basename" --entry-text="user@host:$(dirname $source)"` || exit 3
  echo "$source|$dest" >> $paths_file
fi

# check if user is not trying to do something wrong with rsync
if [ `basename "$source"` = `basename "$dest"` ]; then
  # sync contents of source to dest
  source="$source/"
fi

log_file=/tmp/Sync.log
rsync_opts=-rltEorzh
echo -e "The following changes will be performed by rsync (see man rsync for info on itemize-changes):\\n$source -> $dest\\n" > $log_file
( echo x; rsync -ni $rsync_opts --delete "$source" "$dest" 2>&1 >> $log_file; rsync_result=$? ) | zenity --progress --pulsate --auto-close --width=350 --title="Retrieving sync information" 

if [ $rsync_result -ne 0 ]; then
  zenity --error --title="Sync" --text="Rsync failed: `cat $log_file`"; exit 4
fi

num_files=`cat $log_file | wc -l`
num_files=$((num_files-3))

if [ $num_files -le 0 ]; then
  zenity --info --title="Sync" --text="All files are up to date on $dest"; exit
fi

zenity --text-info --title="Sync review ($num_files changes)" --filename=$log_file --width=500 --height=500 || exit 4

num_deleted=`fgrep delet $log_file | wc -l`
if [ $num_deleted -ge 100 ]; then
  zenity --question --title="Sync" --text="$num_deleted files are going to be deleted from $dest, do you still want to continue?" --ok-label="Continue" || exit 4
fi

rsync_progress_awk="{ 
 if (\$0 ~ /to-check/) {
  last_speed=\$(NF-3)
 }
 else {
  print \"#\" \$0 \" - \" files \"/\" $num_files \" - \" last_speed;
  files++;
  print files/$num_files*100 \"%\";
 }
 fflush();
}
END {
 print \"#Done, \" files \" changes, \" last_speed
}"

# note: delete-delay below means that any files will be deleted only as a last step
rsync $rsync_opts --delete-delay --progress "$source" "$dest" | awk "$rsync_progress_awk" | zenity --progress --width=350 --title="Synchronizing $source" || exit 4

Thursday, December 17, 2009

Deleting thumbnails for inexisting photos

Freedesktop for some years already has a spec on how applications should manage image thumbnails (use Next link there). The spec is now followed by majority of Gnome and KDE applications, including F-Spot, which is one of the very few applications that uses large 256x256 thumbnails under ~/.thumbails/large.

The spec specifies to store thumbnails in PNG format, naming the files after the MD5 sum of the original URLs of the original files, eg 81347ce6c37f75513c5e517e5b1895b8.png.

The problem with the spec is that if you delete or move image files, thumbnails stay there and take space (for my 20000+ photos I have 1.4Gb of large thumbails).

Fortunately, you can from time to time clean them by using simple command-line tricks, as the original URLs are stored inside of thumbnail files as Thumb:URI attributes. I don't recommend erasing all of your thumbnails, because regeneration will take time.

In order to create a list of matching thumbnail-original URL pairs, you can run the following in a terminal inside of either .thumbnails/large or .thumbnails/normal directories (it will take some time):

for i in `ls *.png`; do
identify -verbose "$i" | \
fgrep Thumb::URI | sed "s@.*Thumb::URI:@$i@" >> uris.txt;
done

This will get you a uris.txt file, where each line looks like the following:

f78c63184b17981fddce24741c7ebd06.png file:///home/user/Photos/2009/IMG_5887.CR2

Note that the provided thumbnail filenames (first tokens) can also be generated the following way from the URLs (second tokens) using MD5 hashes:

echo -n file:///home/user/Photos/2009/IMG_5887.CR2 | md5sum

After you have your uris.txt file, it can be easily processed with any familiar command-line tools, like grep, sed, awk, etc.

For example, in order to delete all thumbnails matching 'Africa', use the following:

for i in `cat uris.txt | fgrep Africa`; do rm $i >/dev/null; done

So, as you can see, it is pretty simple to free a few hundred megabytes (depending on the number of thumbnails you are deleting).

With this kind of trick you can even rename the thumbnails of moved files if you use md5sum to generate the new filenames from the URLs, as shown above. This will save you regeneration time.

Wednesday, August 26, 2009

Announcing F-Spot Live Web Gallery extension

I am happy to announce a new extension for F-Spot, the popular Linux photo management application - LiveWebGallery. Once installed, invoke it from the Tools menu in F-Spot's main window.

The extension contains a minimal web server implementation that serves the user's gallery over HTTP and can be viewed with any web browser, even on Mac and Windows. So now you are able to easily share your photos with family, friends, colleagues no matter what operating system and software they use by doing just a few mouse clicks in F-Spot. The only requirement is that they have to be on the same network, or be able to access your machine's IP address in some other way.

As you can see in the screenshot, you can choose whether to share photos with a particular tag, current view in F-Spot (allows you to create an arbitrary query) or the currently selected photos.

To activate the gallery (start the embedded lightweight web server), just click activate button in the top-right corner. On activation, the URL of the web gallery will appear, allowing you either to open it yourself or copy the link and provide to other viewers.

After that all the options can still be changed in the dialog and will affect all new viewers or those pressing the browser's reload button.

Most of us already know that many pictures are rarely viewed after they are made (Point, Shoot, Kiss It Goodbye). F-Spot tries to fix this with its very powerful tagging features - tags make it much easier to find photos made long ago. This, however, is no magic - the possibilities of finding the right photos when needed depends on how well you tag. Now, this extension allows to make tagging even more useful, because other people can help you with the most difficult part - properly tagging can sometimes be a lot of work. With this extension, you can delegate some it to other people! The gallery is not read-only - if you choose so, an editable tag can be selected and viewers can add/remove this tag from photos (currently only in the full photo view). This is especially useful to let other people to tag themselves in your library. For security reasons, editing is disabled by default.

As time goes by, a lot more features can be added to Live Web Gallery extension, especially related to editing photo metadata (tagging, editing descriptions, flagging for deletion).

As far as I know, being able to share your photos on the local network without any software or OS requirements is a unique feature of F-Spot now. No other photo management application can do this to date.

Downloading

The source code is on Gitorious, live_web_gallery branch (until is has been merged to the mainline).

To install, use the Edit->Manage Extensions menu in the F-Spot, click on Install Add-ins and then Refresh. After that LiveWebGallery should be available under the Tools category.

Or, alternatively, you can download the precompiled binary and put it to:
~/.config/f-spot/addins or /usr/lib/f-spot/extensions

Note: F-Spot 0.6 is required for it to work. You can already find deb/rpm packages for F-Spot 0.6 or 0.6.1 for most distributions and it will be included in the upcoming distro releases this autumn.

Hopefully, the extension will later be distributed with newer versions of F-Spot by default.

Enjoy! Comments are welcome!

Monday, June 1, 2009

Database Refactoring

A couple of months ago I have made a short keynote titled Dinosaur Strategies: How Can Data Professionals Still Prosper in Modern Organisations, inspired by Scott Ambler's joke on the fictional Waterfall 2006 conference website.

(see the slides)

I primarily deal with 'application' aspects of software development using Agile practices, so I have a hard time understanding how some Data Professionals can be so behind in their evolution, and not doing some basic things like iterative development, unit tests, continuous integration, etc.

Last week I was asked to give a talk on Database Refactoring. The topic seemed challenging enough and as no Database Professionals cared to lead the topic, I decided to give it a try. The result is a motivational speech for both database developers as well as others in the software development process.

I have discussed the cultural conflict of database and OOP developers, the problem of refactoring tools available to relational database developers lagging behind, and some solutions to these problems that can help before these tools become available:

(1) Development Sandboxes
(2) Regression Testing
(3) Automatic Changelog, Delta scripts
(4) Proper Versioning
(5) Continuous integration
(6) Teamwork & Cultural Changes

Other discussed topics include Refactoring of Stored Code vs Database Schema, Agile Reality, Overspecialization (016n), Database not being under control, Database Smells, Fear of Change, Scenarios, Dealing with Coupling, Dealing with unknown applications, Proper versioning, Continuous Integration using sandboxes, and Delta Scripts (Migrations), which make evolutionary database schema possible.

The dinosaurs below are the reminder of my previous keynote available above. They come from the very nice Dinosaurs Song, available on YouTube, which I have actually played after the keynote itself.

Below are full slides of the Database Refactoring talk.

(click for PDF slides)

Sunday, May 17, 2009

Versioning your home directory or documents with Git

Git is a relatively new Version Control System, initially started by Linus Torvalds in order to manage the source code of Linux Kernel.

Although Randal Schwartz has stated that Git was not designed to version your home directory, it seems that many people are now trying to do so :-)

Some people have used CVS or Subversion for this purpose in the past, but to my mind, Git is suited better for this task for several reasons:

Git is grep-friendly (only stores it's metadata in a single .git directory at the root of working copy)
It is very easy to work with a local repository (just do git init and you're ready)
Git stores changes very efficiently (even binary files), so not much disk space is wasted, but don't forget to call git gc from time to time
Git repository is always available on your computer, even when you are offline, but on the other hand it is very easy to push your changes to a remote repository as well

All these things are much worse with CVS, which spams all versioned directories with CVS subdirs and stores each version of binary files fully. Subversion also requires more effort to setup, is less storage-efficient, and puts .svn subdirs everywhere.

Having said that, my setup is ultra-simple compared to others on the net!

To start versioning your home directory, just run this in the root of your home:

git init

This will initialize an empty local Git repository in ~/.git/ - this is the location that you can use when doing backups, but otherwise you shouldn't care about it anymore.

Then you need to tell Git to track your important files:

git add Documents
git add bin
git add whatever else you want to version
git commit -m "Adding initial files"

Then you can work normally with your tracked files and occasionally commit your changes to the repository with

git commit -a "description of changes you have done"

Note the "-a" above, that means to commit any changes made to any previously tracked files, so you don't have to use git add again. But don't forget to git add any new files you create before committing.

Use git status to show what files were changed since your last commit. Unfortunately, it will also list all untracked files in your home directory, so you may need to create a .gitignore file. You can get the initial version of this file using this command:

git status | awk '/#/ {sub("/$", ""); print $2}' > .gitignore

then, edit it and possible replace some full names partly with '*'. Don't forget to git add and git commit this file as well!

That's, basically, it! You may also try some GUI tools provided by git, eg gitk or git gui to browse your changes and do some changes if you can't remember the commands.

Moreover, I have some more ideas how to make all this more automatic that I am going to try laster:

Put git commit -a to user's crontab in order to commit changes automatically, eg daily
Create a couple of nautlus scripts (located in ~/.gnome2/nautilus-scripts) to make adding, comitting and other actions available directly from Nautlilus file manager in Gnome.

Happy versioning! And read the Git tutorial with either man gittutorial or on the official site.

Sunday, April 26, 2009

Excessive memory usage by Oracle driver solved

On my day job I deal with Internet banking. The Internet bank is a relatively large and high-load Java/Spring/Hibernate web application, which uses Oracle databases.

During our recent transition from a centralized data accessor (VJDBC) to local JDBC connection pools to reduce data routrip times, we have started having issues with memory usage in our application servers: some requests started to allocate tens to hundreds of megabytes of memory. While Garbage Collector was successfully reclaiming all this memory afterwards (no memory leaks), it still posed a problem of high peak memory usage as well as too frequent collections, also affecting the overall performance.

While profiling memory allocations with JProfiler, I have discovered that OracleStatement.prepareAccessors() is responsible for these monstrous allocations (up to 600 Mb at once, most in either char or byte giant arrays). Google has pointed to this nice article on reducing the default prefetch size, describing a very similar situation, however these guys have had problems with queries returning LOBs. We haven't used any LOBs in our problematic queries and haven't modified the defaultRowPrefetch connection property knowingly.

Further investigation led to the way we were using Hibernate: for some quesries that are expected to return large result sets, we were using the Query.setFetchSize() or Criteria.setFetchSize() methods with rather high values (eg 5000). This seemed reasonable, because we were also using the setMaxResults() method with the same value to reduce the maximum length of the returned ResultSet. However, after doing some upgrades of Java, Hibernate, and Oracle driver, this had started having these memory allocation side-effects. It seems that now Hibernate translates this fetchSize parameter directly to OracleStatement's rowPrefetch value, forcing it instantly allocate a rowPrefetch*expectedRowSize sized array even before it runs the actual query. This array can be ridicuosly large, even if the actual query returns only a few rows afterwards. Later investigation showed that also having the batch-size attribute in the Hibernate mapping files (hbm.xml) has exactly the same effect and also results in giant pre-allocations.

As a result, we had to review all batch-size and setFetchSize() values that we were using with our Hibernate queries and mappings, in most cases reducing them significantly. This would reduce the worst-case performance of some long queries (they would require more roundtrips to the database), but would also reduce the overall amount of garbage accumulating in the heap and thus reduce the frequency of garbage collections, having a positive impact on CPU load. Shorter results would run equally fast, so it makes sense actually to rely on average statictics of the actual responses when chosing optimal rowPrefetch values. The default value is 10, which is hardcoded in the Oracle driver.

For longer queries, the abovementioned article has proposed an idea of geometrically increasing the rowPrefetch (setting it twice as big for each subsequent fetch manually). This is a nice idea, but I wonder why Oracle driver can't do this automatically? This is how Java collections behave when they resize themselves. I haven't tried doing this with Hibernate yet, but I think it should be possible, especially if you use the Query.scroll() instead of Query.list().