The main website has been updated with a number of items, just in time for the MT Summit XII poster session:
- The code repository has been updated. Now specific languages can be extracted, vote segments removed, footnotes removed and in-paragraph annotations flattened.
- A new version of the corpus has been uploaded processed by the tool above to to remove footnotes and flatten the in-paragraph annotations. Otherwise, the content is the same. This version is more suitable for direct import into commercial tools, which may not be implementing the fancier bits of TMX.
- The paper describing the corpus is now linked to.
