A wish a day 2: Calligra document versioning

Calligra Suite is an integrated work applications suite based on Qt and KDE. It is a fork continuation of KOffice, and it’s where most of the former KOffice developers have landed. Calligra includes a text processor, a spreadsheet, a presentations application, a database, vector and image editors, etc

Two years ago I made a proposal for in-document versioning for (then) KOffice applications. It would be based on libQtGit, which I wished about yesterday, and a new library tentatively called libQVersioned, which would provide a higher-level API.

This is what I wrote back then, I have added a few clarifications:

I’m working on a more general solution towards document versioning.

First, I’m developing libQtGit, which provides a Qt-like API to use git. Unfortunately, it’s been on hold for three months because I was too busy with real work. The good news is last Monday I started hacking on it again.

Second, I intend to develop a framework (tentatively named QVersioned for now) on top of libQtGit. QVersioned will provide QVersionedDir and QVersionedFile classes (amongst others, but those two are the most important). Those classes would essentially have the same API QDir and QFile have. QVersioned would open ZIP files and store a .git directory inside. There are cases, like OpenDocument files (which are already a ZIP file) where nothing special would be needed. For other cases (for instance, a .txt file), there would be a .txtv (meaning “.txt versioned”) which would be a ZIP fiel containing the .txt + .git directory.

Now, how did I intend to implement ODF versioning, which was going to be the “this thing works” case?:

  • The .git folder would store the uncompressed contents of the ODF file, i. e. XML, images, etc. This is needed to avoid duplication, allow diffs, etc
  • There would be also a a checkout, which would provide a full copy of the latest version (XML, images, etc) in the ZIP file, just like any ODF-capable application not supporting QVersioned would expect (these applications would just ignore the .git directory)
  • Clarification: what items 1 and 2 mean is to version an ODF file you would do something like:
    1. mkdir doc
    2. cd doc && unzip document.odt
    3. git init
    4. git add .
    5. git commit -a -m “<comment provided by user>”
    6. zip -9 document.odt .git *
  • When a QVersioned-capable application opens an ODF file, it compares the XML, images, etc to the latest version in git:
    • If the diff is empty, last save was by a QVersioned-capable application
    • If the diff is not empty, last save was by a QVersioned-incapable application. A new “git commit -a” is performed. Yes, we probably have lost “versions” of the document in between this commit and the former one but something’s better than nothing.

By using libQtGit and QVersioned, it would also be possible to add collaboration features such as “send me an update” (i. e. send me a diff which transforms my outdated document into your latest version), collaborative editing (send diffs back and forth) and more things I cannot think of right now.

In case you are interested in the libQtGit API (remember QVersioned will offer a higher-level API), this is the signature of the method you would call:

Git::Repository::Commit* Git::Repository::commit (
    const QString& message = QString(),
    const Commit* c = 0,
    const QString& author = QString(),
    const CommitFlags cf = DefaultCommitFlags,
    const CleanUpMode cum = DefaultCleanUpBehavior
);

That’s equivalent to “git commit -C commit_id -m “message” “. CommitFlags is a QFlags and CleanUpMode a QEnum:

enum CommitFlag {
    DefaultCommitFlags = 0x0 /*< git commit */,
    OverrideIndexCommit = 0x1 /*< git commit -a */,
    SignOffCommit = 0x2 /*< git commit -s */,
    AmendCommit = 0x4 /*< git commit --amend */,
    AllowEmptyCommit = 0x8 /*< git commit --allow empty */,
    NoVerifyCommit = 0x16 /*< git commit -n */
};
Q_DECLARE_FLAGS ( CommitFlags, CommitFlag )

enum CleanUpMode {
    DefaultCleanUpBehavior = 0x0 /*< git commit */,
    VerbatimClean = 0x1 /*< git commit --cleanup=verbatim */,
    WhiteSpaceClean = 0x2 /*< git commit --cleanup=whitespace*/,
    StripClean = 0x4 /*< git commit --cleanup=strip */,
    DefaultClean = 0x8 /*< git commit --cleanup=default */
};
Q_ENUMS ( CleanUpMode )

For the “git commit -a -m “Save latest unversioned version on ODF document opening” “, we would use:

// Assuming 'repo' is a valid Git::Repository object

repo->commit (
    "Save latest unversioned version on ODF document opening",
    0,
    "(The application would probably take the author's name from the product registration)",
    Git::Repository::OverrideIndexCommit
);

So, how is libQtGit doing? Well, the API is there for X git add, commit, init, mv, rm, checkout, clone, branch, revert, reset, clean, gc, status, merge, push, fetch, pull, rebase, config, update-server-info and (partially) symbolic-ref. When I say “the API is there” I mean “all the QFlags, QEnums, methods, classes and its translation to git parameters is done”. It’s just a matter of implementing the QProcess part, parsing output, etc. Boring and time-consuming but easy.

In addition to file versioning and collaboration, there is another interesting feature (that I will wish about tomorrow) that could be achieved.

 

8 thoughts on “A wish a day 2: Calligra document versioning

  1. dd

    Versioning in Calligra Words seems already to exist.
    At least in git and not yet really helpfull yet, but the beginning is made.
    And since OpenOffice, LibreOffice should be able to read the files, i don’t think git would be a good tool for this. It probably has to be pure xml.

    And since git is really bad with binary data, it would probably make much sense to do git versioning in for example krita.

    Reply
  2. pgquiles Post author

    I’m going to partially answer your points but please re-read the post:

    Versioning in Calligra Words seems already to exist.
    At least in git and not yet really helpfull yet, but the beginning is made.

    Nice to know. It was not even in the works when I proposed this in March 2009.

    And since OpenOffice, LibreOffice should be able to read the files, i dont think git would be a good tool for this. It probably has to be pure xml.

    Have you read the technical details? What I am proposing is to embed a git repository in the ODT, ODS, etc file (which is a ZIP file) in addition to the normal checkout OpenOffice and other OpenDocument-ready applications support.

    And since git is really bad with binary data, it would probably make much sense to do git versioning in for example krita.

    To solve this, a binary diff tool for Krita layers would be needed (probably the code already exists in Krita), and then you can tell git to use another diff tool if working with Krita files.

    Anyway, this was mostly targeted at text documents, not at pictures.

    Reply
  3. jstaniek

    Very interesting post! Just a side note: Calligra is definitely not a fork, since all but one applications were moved/removed from koffice svn directory and put into calligra git. E.g. there is no Kexi in “koffice” git repository since it is developed in Calligra. To have a fork, one needs to copy application, here at calligra we see all but one the original maintainers and developers in place. Also the git repo called koffice is not the continuation of “koffice” project, since it has no official right for the name. Expect its status to be clarified this year.

    Reply
  4. jstaniek

    After some thinking, I have a note: ODF is XML-based, structured and context-sensitive format. Git operates per line. Let’s imagine long line with many nested XML elements. Change within particular XML element would be visible for git just as a change for the entire line. The scope for git is dedicated just for source code/ plain text blueprints, without regard to structuring. If there was no issue in merging (e.g. with preserving vaidity of XML), tools like [http://www.oxygenxml.com/xml_diff_and_merge.html] would not exist. I recommend to look at this example.

    Secondly, using git for storing gives no benefit compared to complexity of the solution even if you wrap it with libqtgit.

    Third, interoperability. LibreOffice won’t be able to access the changes unless you port the solution to C++-only. Interestingly, the ODF standard has changes support, every character can be marked with change definition.

    Also, the advantages of git are features for power users utilizing command line (advanced merging, editing history, dozens of complex concepts that even proffessionals tend to skip in daily work). Users of the office suites are probably not the target audience. It is even hard to imagine workable workflow, where advantages of git are used. If the workflow is simple, then then any use of git specifics and advantages has no application.

    That said, I have seen some room for research regarding using git in office apps. What I mean is _maybe_ applying it to archiving databases. Even then, using extra tables for storing versions is more reliable and natural, not mentioning that some database systems (in particular file systems) have versioning built in.

    Reply
  5. pgquiles Post author

    Let’s see:

    The XML vs per-line problem is easily solvable with tools such as diffxml. You can tell git to use diffxml instead of plain diff.

    Complexity vs benefit: it really depends on how much you squeeze out of git. For instance, see with third wish, translations, or the collaborative editing I talked about on this post (it’s just a matter of sending diff’s to the participants).

    Interoperability? There are two points here, one is “we will have document revisions based on git” and the other is “we will implement revision-control-based-on-git by means of libqtgit”. The first point is the important one, the latter is just an implementation issue. Calligra’s implementation may be based on libqtgit, LibreOffice’s on execve and calling git directly. Or just no support for this kind of revision control.

    As for the power-user usage of git, obviously my proposal only requires a small subset of what git can do. In fact, there I’m talking about git only to make this easier to grasp, this could be implemented using any version control system (svn, mercurial, or even a custom one developed for this purpose).

    Reply
  6. jstaniek

    “The XML vs per-line problem is easily solvable with tools such as diffxml.”

    Let’s say that the beauty of engineering is to avoid creating extra and unnecessary problems while trying to solve one. Git introduces problems when applied as a solution to problem you described. That’s my informed and ultimate opinion.

    That said I like your recent activity, keep it going!

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>