At various points the sysadmins have gotten a request to have repository tarballs made available. The idea is that some repositories can be very large; I believe test runs of the KOffice conversion produced a repository that is 350MB in size. Once you get this down to your system, updating with new refs is relatively fast, but what about that first part? What if you’re on a slow dial-up link somewhere (as some of our contributors are) and/or only have access to the Internet sporadically?

A solution to this is to provide repository tarballs to users which contain a snapshot of the repository. Once downloaded and expanded, you have a viable git clone; run a pull and you’re done (it’s pre-configured to fetch new refs from anongit.kde.org). Actually, run the init script and then a pull; to keep the tarball size as small as possible, it doesn’t include a checked-out working tree, only the .git directory and a tiny script that will a) delete itself and b) check out the working tree.

Now, the idea is to be able to start downloading at one time and to finish it later. The key here is that HTTP file transfers are resumable, so if you start downloading the tarball you can pick up where you left off. However, there was a question — how to be able to show a consistent link on Projects without a priori knowledge of tarball characteristics (since we can’t know any from within Redmine). In Redmine, all we could really do, without massive hackery, is display a statically-named tarball — in this case, it always ends in “-latest.tar.gz”. You can see this at https://projects.kde.org/projects/playground/network/aki/repository or any other project’s repository tab — note the “Tarball” checkout method.

This means however that the client needs some way of knowing whether it’s the same tarball or has been regenerated since. An easy way to do this is to use the If-Unmodified-Since header; however, the web server in use currently doesn’t support this (I checked and the author didn’t believe it was widely used; I pointed him to cURL/libcurl). The other problem with this approach is that it’s not very visible to the user.

So, I first changed the script such that the tarballs being generated weren’t actually named *-latest.tar.gz, but instead that they had more descriptive names and had a -latest.tar.gz version symlinked to the actual tarball. I then wrote a simple web service that the web server proxies to whenever it finds a “-latest.tar.gz” file being requested. This service resolves the symlink and redirects the client, so the user sees immediately what the name of the actual file is. For instance, right now a tarball of the aki repository via http://anongit1.kde.org/aki/aki-latest.tar.gz forwards to http://anongit1.kde.org/aki/aki_20101202050239_sha1-5f866b3a42872f8fd54adcf30bcb8a3a79d02542.tar.gz

Note the components of that filename: the name, the date/time stamp (to the second), “sha1″ indicating that that’s the algorithm to check the hash against, and the hash of the file for easy verification of the completed download. This should make it both easy for the client and the user to verify that it’s the same file, in addition to making it easy for the user to verify the integrity of the full file since they can check the sha1sum against the filename itself.

I think this provides a pretty nice solution to the problem. :-)