I don’t blog often, but when I do it tends to be meaty.  I won’t disappoint.  I’ll be covering Amarok, Amarok history, and a possible future part of kdelibs.

"We can rebuild him. We have the technology. Better than before. Better, stronger, faster."

A little-known feature in Amarok 1, starting at about 1.4.3, was what was known as Amarok File Tracking, or AFT.  For every single file in your collection, on scan, a unique identifier (UID) was generated from some of the file’s attributes.  If you moved your tracks around your folders, as the incremental scan kicked in, the UID would allow for the file to be identified, and integration throughout Amarok would mean that your statistics, your cached lyrics, and the current playlist would all be updated with the new path.  No longer did you have to worry that moving around your files would mean losing years of statistics.  Or losing your files.

But I’m getting ahead of myself.

See, AFT wasn’t born AFT.  AFT could not track both a file metadata change and a file location change at once, because the UID was being based on file properties such as file length, plus a portion of the file itself hashed together. So you could still lose track of your files.  This was a limitation that was known in advance.

It was also a limitation that didn’t originally exist.  As I said, AFT wasn’t born AFT.  It was born as Advanced Tag Features, or ATF.  ATF was the same idea, but a little different — it would store the generated UID directly in the file’s metadata.  This allowed for superb file tracking capabilities, because unlike generating a UID from a part of a file, if that part of the file changed, you’d still have your UID.  In fact, the only way you *couldn’t* track your file was if you either removed the file’s tag entirely (or some other program removed the UID when it shouldn’t), or if you removed the corresponding information from Amarok’s database. (There are some downsides to this scheme: only certain file types are supported, for instance, determined by the kind of tag they use and the tag’s ability to store this kind of information.)

So why the change?  Well, ATF had a problem, which was related to the structure of Amarok itself, and Amarok’s historical penchant for crashing (which got much better as the 1.4 series progressed).  The outcome is possibly worthy of an entry in The Daily WTF.  In gory detail, here’s the problem.

   1. Amarok would start a collection scan.  The collection scanner was the entity responsible for adding the UIDs to the file metadata.  Important note: the collection scan was a separate process.
   2. Amarok would crash, leaving the collection scan running, although not communicating with anything.  This scanner could be very slow if it was adding the UIDs, depending on whether padding had to be added to the file’s tag.  If this was the case, the entire file would have to be rewritten.
   3. Amarok would be restarted by the user.  Another collection scan process would start.  Becuase UIDs would already exist for the early files, it would very quickly catch up to the first collection scan process.
   4. You now had two collection scan processes generating and writing UIDs at the same time to the same file.  If you were lucky, this would mess up your tag.  If you were unlucky, this toasted your entire file.
   5. Repeat step 4 for the rest of the scan.

ATF was never released in this state, but it did get turned on in SVN.  And a few unlucky users had far too many files end up corrupted, depending on how crashy things became for them.  After we finally realized what the issue was, a user came forward on the mailing list (still trying to find the exact mail or user) proposing a solution that I believe they’d seen in a class.  Essentially, the solution relies on modifications to temporary, uniquely named files instead of the original file, using MD5 checksums to find out of the original file has changed while writing the new file, then using filesystem atomicity guarantees to move the new file back over the old one.  This became the MetaBundleSaver, and it worked quite well, but it was also extremely slow compared to a normal scan.  And most importantly, no one was quite trusting of the whole ATF scheme any more.

So, ATF was renamed to AFT and with it came a new algorithm that wouldn’t touch anyone’s files, but couldn’t track as well.

A couple weeks ago, I added AFT to Amarok 2′s SqlCollection.  Enjoy, everyone — statistics, lyrics, and the playlist are already supported, with support for stored playlists coming eventually.  But there’s more.

Fast forward to today (okay, two days ago).  I’m taking a shower — Wade does insist that there’s something about showers and KDE coders — and I had a thought, which was essentially: there’s absolutely no reason why Amarok 2 can’t use a UID inside a file, if one exists, for superior tracking, and if not, generate a read-only type for normal tracking.

So I created a utility that is built and installed with Amarok 2.  It’s called amarok_afttagger, and it will write UIDs into your files, using a class ported from MetaBundleSaver and called SafeFileSaver to ensure that files are not overwritten/interleaved, even if you run the process twice or three times at once.  It optionally supports recursion if you want to pass in directories, and it can also remove UIDs from your files if you like.  Right now it supports MP3s only, but Vorbis and FLAC support will be coming soon.

I’ve tested it extensively.  I’ve added UIDs to files, removed them from files, regenerated the ones in files, over and over, and still everything is cherry.  And Amarok 2, when it finds these files, can do some awesomely robust file tracking.

I encourage people to give it a run on their MP3s and check it out — if you’re worried by all the Dark Ages info up above and don’t have faith in the implemented solution, back up your files first, or operate on a copy of them, until you’re satisfied it won’t harm your files.  And if you still don’t want to do it, you can enjoy the less awesome but still awesome power of the non-embedded UID file tracking.

Now, I promised this would talk about a possible KDE library.  I’ll eventually be submitting the SafeFileSaver class for hopeful inclusion into kdelibs, so that any application that is worried about data integrity and needs to write to a user’s files can take advantage of it.  It’s very simple to use — you simply give it a file path, and then operate on the file path that’s returned to you when you call prepareToSave(), instead of the original one.  When you’re all done, you call doSave() and it will perform the necessary functions.  That’s it.

Hope this has been enjoyable, and enjoy AFT in Amarok 2.  Play with it and be amazed.  Use amarok_afttagger on your files and be even more amazed.  More information is available here: http://amarok.kde.org/wiki/AFT