Two corpus files reposted

I just updated two corpus files (second and third), as they had invalid XML encoding information. They used utf8 instead of UTF-8.

Unfortunately, XMLStarlet – which I used to pretty print the XML – accepted utf8 as valid value and put it right into the file. This did not affect everybody, but at least [...]

New corpus version, updated tool, copy of the paper are now available

The main website has been updated with a number of items, just in time for the MT Summit XII poster session:

The code repository has been updated. Now specific languages can be extracted, vote segments removed, footnotes removed and in-paragraph annotations flattened. A new version of the corpus has been uploaded processed by the [...]