There is a nifty piece of software called zsync, which is kind-of like rsync, except it is totally different.
Rsync is mainly useful when you want to synchonize a list of files, or directories, between two servers. It will only download the new files and files which have changed. It will even delete or backup the files which have been removed at the original site. Nice.
For a project I was involved until recently at work we had a slightly different problem: we generate a huge file (an ISO image) which contains about 6 GB of data. This ISO image contains the daily build of our application. It contains only a handful of files. Problem is some of them are generated and GB in size, yet from day to day only maybe 100-150 MB have changed (and it would be even less if it were not because of this “feature” of .NET that never generates identical binaries even if using exactly the same source code)
Rsync was not useful in this case: it would download the whole file, gigabytes! (some of the people downloading the ISO are on a slow link in India)
This is exactly the case zsync targets: zsync will only download the changed parts of the file thanks to the rolling checksum algorithm.
Best of all: no need for an rsync server, opening port TCP 873 (which requires months of arguing with BOFHs in some companies), or anything special: HTTP over port 80 and you are done. Provided that you are not using Internet Information Server, which happens to support only 6 ranges in an HTTP request (hint: configure nginx in reserve proxy mode).
But I’m digressing.
Cool. Great. Awesome. Zsync. The perfect tool for the problem.
Except for this project is for Windows, people work on Windows, they are horrified of anything non-Windows, and zsync is only available for Unix platforms.
In addition to that, the Cygwin port suffers from many connection error problems on Windows 7 and does not work on a cmd.exe prompt, it wants the Cygwin bourne shell prompt.
So I started to port zsync to Windows natively.
Native port howto
The starting point was:
- C99 code
- autotools build system
- No external dependencies (totally self-contained)
- Heavy use of POSIX and Unix-only features (such as reading from a socket via file descriptors, renaming a file while open, deleting a file while open and replacing it with another file yet still use the same file descriptor, etc)
To avoid breaking too much, and because I wanted to contribute my changes upstream, my intention was to do the port step by step:
- MSYS/MinGW gcc/CMake
- Visual C++/CMake
Autotools was the first stone in the path.
With some work (calling MSYS from a DOS prompt, etc) it would have been possible to make it generate a Visual C++ Makefile but it would have been painful.
Plus the existing autotools build system did not detect the right configuration on MinGW.
Step 1: replace autotools with CMake. On Linux. This was relatively easy (although time consuming) and did not require any change in the code.
The second step was to build zsync on Windows using Cygwin (which provides a POSIX compatibility layer) and CMake.
No code changes were required here either, only a few small adjustments to the CMake build system. I tested on Linux again, it worked fine.
At this point, I had only made a pyrrhic progress: zsync was still Unix only, but with a cross-platform build system.
My next step was a serious one: port zsync to use MinGW, which generates a native Windows application with gcc.
That means using Winsock where required.
5And hitting Microsoft’s understanding of “POSIX-compliant”: the standard Windows POSIX C functions do not allow to treat sockets as files, rename open files, temporary files are created in C:\ (which fails on Windows Vista and newer), etc. And that’s when the functions do exist. In many cases (mkstemp, pread, gmtime_r…) those functions were outright inexistent and I needed to provide an implementation.
Plus adapting the build system. Fortunately, I was still using gcc and Qt Creator provides great support for MinGW and gdb on Windows, and decent support for CMake.
Some other “surprises” were large file support, a stupid “bug” and the difficulties of emulating all the file locking features of Unix on Windows.
Regarding LFS, I took the easy path: instead of using 64-bit Windows API directly, I used the mingw-w64 flavor of gcc on Windows, which implements 64-bit off_t on 32-bit platforms transparently via _FILE_OFFSET_BITS.
Visual C++ misery
Porting to Visual C++ was the last step.
This was not strictly required. After all, all I had been asked for as a native version, not a native version that used Visual C++.
Yet I decided to give VC++2010 a try.
The main problem was lack of C99 support (though you can partially workaround that by compiling as C++) and importing symbols due to lack of symbol exports in the shared library (attributes for symbol visibility were introduced in gcc4.0, but many libraries do not use them because gcc does its “magic”, especially MinGW, which will “guess” the symbols).
Porting to Visual C++ 2010 required either to give up some C99 features in use (e. g. moving variable declarations to the beginning of the functions) or adding a lot of C++-specific workarounds (extern “C”).
I was a bit worried upstream would not accept this code because it didn’t really provide any benefit for the application (for the developer, use of a great IDE and very powerful debugger), therefore I didn’t finish the Visual C++ port. Maybe some day if Microsoft decides to finally provide C99.
The result (so far) is available in the zsync-windows space in Assembla.