Guidelines on cross-platform porting

There is a nifty piece of software called zsync, which is kind-of like rsync, except it is totally different.

Rsync

Rsync is mainly useful when you want to synchonize a list of files, or directories, between two servers. It will only download the new files and files which have changed. It will even delete or backup the files which have been removed at the original site. Nice.

For a project I was involved until recently at work we had a slightly different problem: we generate a huge file (an ISO image) which contains about 6 GB of data. This ISO image contains the daily build of our application. It contains only a handful of files. Problem is some of them are generated and GB in size, yet from day to day only maybe 100-150 MB have changed (and it would be even less if it were not because of this “feature” of .NET that never generates identical binaries even if using exactly the same source code)

Rsync was not useful in this case: it would download the whole file, gigabytes! (some of the people downloading the ISO are on a slow link in India)

 

zsync

This is exactly the case zsync targets: zsync will only download the changed parts of the file thanks to the rolling checksum algorithm.

Best of all: no need for an rsync server, opening port TCP 873 (which requires months of arguing with BOFHs in some companies), or anything special: HTTP over port 80 and you are done. Provided that you are not using Internet Information Server, which happens to support only 6 ranges in an HTTP request (hint: configure nginx in reserve proxy mode).

But I’m digressing.

Cool. Great. Awesome. Zsync. The perfect tool for the problem.

 

Hello Windows

Except for this project is for Windows, people work on Windows, they are horrified of anything non-Windows, and zsync is only available for Unix platforms.

Uh oh.

In addition to that, the Cygwinport suffers from many connection error problems on Windows 7 and does not work on a cmd.exe prompt, it wants the Cygwin bourne shell prompt.

So I started to port zsync to Windows natively.

 

Native port howto

The starting point was:

  • C99 code
  • autotools build system
  • No external dependencies (totally self-contained)
  • Heavy use of POSIX and Unix-only features (such as reading from a socket via file descriptors, renaming a file while open, deleting a file while open and replacing it with another file yet still use the same file descriptor, etc)

To avoid breaking too much, and because I wanted to contribute my changes upstream, my intention was to do the port step by step:

  1. Linux/gcc/autotools
  2. Linux/gcc/CMake
  3. Cygwin/gcc/CMake
  4. MSYS/MinGW gcc/CMake
  5. Visual C++/CMake

 

Autotools

Autotools was the first stone in the path.

With some work (calling MSYS from a DOS prompt, etc) it would have been possible to make it generate a Visual C++ Makefile but it would have been painful.

Plus the existing autotools build system did not detect the right configuration on MinGW.

Step 1: replace autotools with CMake. On Linux. This was relatively easy (although time consuming) and did not require any change in the code.

 

Cygwin

The second step was to build zsync on Windows using Cygwin (which provides a POSIX compatibility layer) and CMake.

No code changes were required here either, only a few small adjustments to the CMake build system. I tested on Linux again, it worked fine.

At this point, I had only made a pyrrhic progress: zsync was still Unix only, but with a cross-platform build system.

 

MinGW

My next step was a serious one: port zsync to use MinGW, which generates a native Windows application with gcc.

That means using Winsock where required.

5And hitting Microsoft’s understanding of “POSIX-compliant”: the standard Windows POSIX C functions do not allow to treat sockets as files, rename open files, temporary files are created in C:\ (which fails on Windows Vista and newer), etc. And that’s when the functions do exist. In many cases (mkstemp, pread, gmtime_r…) those functions were outright inexistent and I needed to provide an implementation.

Plus adapting the build system. Fortunately, I was still using gcc and Qt Creator provides great support for MinGW and gdb on Windows, and decent support for CMake.

Some other “surprises” were large file support, a stupid “bug”and the difficulties of emulating all the file locking features of Unix on Windows.

Regarding LFS, I took the easy path: instead of using 64-bit Windows API directly, I used the mingw-w64 flavor of gcc on Windows, which implements 64-bit off_t on 32-bit platforms transparently via _FILE_OFFSET_BITS.

 

Visual C++ misery

Porting to Visual C++ was the last step.

This was not strictly required. After all, all I had been asked for as a native version, not a native version that used Visual C++.

Yet I decided to give VC++2010 a try.

The main problem was lack of C99 support (though you can partially workaround that by compiling as C++) and importing symbols due to lack of symbol exports in the shared library (attributes for symbol visibility were introduced in gcc4.0, but many libraries do not use them because gcc does its “magic”, especially MinGW, which will “guess” the symbols).

Porting to Visual C++ 2010 required either to give up some C99 features in use (e. g. moving variable declarations to the beginning of the functions) or adding a lot of C++-specific workarounds (extern “C”).

I was a bit worried upstream would not accept this code because it didn’t really provide any benefit for the application (for the developer, use of a great IDE and very powerful debugger), therefore I didn’t finish the Visual C++ port. Maybe some day if Microsoft decides to finally provide C99.

The result (so far) is available in the zsync-windows space in Assembla.

 

4 thoughts on “Guidelines on cross-platform porting

  1. sys

    > Rsync was not useful in this case: it would download the whole file, gigabytes!
    Something was wrong in that setup. I remember downloading Knoppix using a bad internet connection and rsync. After a faulty download (md5sums didn’t match, either), rsync only re-downloaded the bad parts. Later I could execute rsync again. Finally md5sums matched.

    From the official “rsync algorithm” page ( https://rsync.samba.org/tech_report/ ):
    “The algorithm identifies parts of the source file which are identical to some part of the destination file, and only sends those parts which cannot be matched in this way. Effectively, the algorithm computes a set of differences without having both files on the same machine. The algorithm works best when the files are similar, but will also function correctly and reasonably efficiently when the files are quite different. ”

    From “http://everythinglinux.org/rsync/”:
    “Diffs – Only actual changed pieces of files are transferred, rather than the whole file. This makes updates faster, especially over slower links like modems. FTP would transfer the entire file, even if only one byte changed.
    Compression – The tiny pieces of diffs are then compressed on the fly, further saving you file transfer time and reducing the load on the network.”

    Reply
    1. sys

      OK! I’ve got a test case to demonstrate that rsync does not download all the file:

      1) I executed
      rsync -avz rsync://ftp.uni-kl.de/knoppix/qemu-0.8.1/qemu.exe .
      and my system downloaded about 7,27 MiB.
      Then rsync wrote
      sent 48 bytes received 7399136 bytes 30016.97 bytes/sec
      total size is 7396548 speedup is 1.00

      I executed
      md5sum qemu.exe
      and it answered
      8ebdbc46620badb76972fabd3de25b1f qemu.exe

      2) I executed again
      rsync -avz rsync://ftp.uni-kl.de/knoppix/qemu-0.8.1/qemu.exe .
      and my system downloaded about 0 MiB.
      Then rsync wrote
      sent 29 bytes received 62 bytes 20.22 bytes/sec
      total size is 7396548 speedup is 81280.75

      I executed
      md5sum qemu.exe
      and it answered
      8ebdbc46620badb76972fabd3de25b1f qemu.exe
      what was the same result as before, of course.

      3) I launched Okteta and with it, I changed a byte of the “qemu.exe” file.

      I executed
      md5sum qemu.exe
      and it answered
      b0fecd0af32c0fe9681fc440ec92a7f2 qemu.exe
      what was a different result than before, of course.

      4) I executed again
      rsync -avz rsync://ftp.uni-kl.de/knoppix/qemu-0.8.1/qemu.exe .
      and my system downloaded about 0 MiB.
      Then rsync wrote
      sent 16416 bytes received 2821 bytes 4274.89 bytes/sec
      total size is 7396548 speedup is 384.50

      I executed
      md5sum qemu.exe
      and it answered
      8ebdbc46620badb76972fabd3de25b1f qemu.exe
      what was the correct result.

      * * *

      Note: rsync can be used with ssh, for having enough security.

      Reply
  2. sys

    Ow! This website “ate” my spaces. I’ll try again, but using underscores to indent:

    OK. I’ve got a test case to demonstrate that rsync does not download all the file:

    1) I executed
    ____rsync -avz rsync://ftp.uni-kl.de/knoppix/qemu-0.8.1/qemu.exe .
    and my system downloaded about 7,27 MiB.
    Then rsync wrote
    ____sent 48 bytes received 7399136 bytes 30016.97 bytes/sec
    ____total size is 7396548 speedup is 1.00

    I executed
    ____md5sum qemu.exe
    and it answered
    ____8ebdbc46620badb76972fabd3de25b1f qemu.exe

    2) I executed again
    ____rsync -avz rsync://ftp.uni-kl.de/knoppix/qemu-0.8.1/qemu.exe .
    and my system downloaded about 0 MiB.
    Then rsync wrote
    ____sent 29 bytes received 62 bytes 20.22 bytes/sec
    ____total size is 7396548 speedup is 81280.75

    I executed
    ____md5sum qemu.exe
    and it answered
    ____8ebdbc46620badb76972fabd3de25b1f qemu.exe
    what was the same result as before, of course.

    3) I launched Okteta and with it, I changed a byte of the “qemu.exe” file.

    I executed
    ____md5sum qemu.exe
    and it answered
    ____b0fecd0af32c0fe9681fc440ec92a7f2 qemu.exe
    what was a different result than before, of course.

    4) I executed again
    ____rsync -avz rsync://ftp.uni-kl.de/knoppix/qemu-0.8.1/qemu.exe .
    and my system downloaded about 0 MiB.
    Then rsync wrote
    ____sent 16416 bytes received 2821 bytes 4274.89 bytes/sec
    ____total size is 7396548 speedup is 384.50

    I executed
    ____md5sum qemu.exe
    and it answered
    ____8ebdbc46620badb76972fabd3de25b1f qemu.exe
    what was the correct result.

    * * *

    Note: rsync can be used with ssh, for having enough security.

    Reply
  3. Solerman Kaplon

    The good thing about zsync (which rsync does not support) is doing partial download of zipped files. Yup, you can zip it, download, change a bit, recompress and download the only changed part without uncompressing the whole file. Great for java apps distributed as JAR/WAR/EAR and raills app working on JRuby also as WAR files. Now one just need to find a way to get a glibc-free version of it on linux so one can distribute the binary with the App without worring about binary API mismatch between different distros (I got lots of troube between different CentOS versions)

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>