Hello UN Corpora

The UN Corpora website is now live with the files, a basic description and links to tools and this blog.

A quick start:

Download and unpack the corpus Download and install XMLStarlet Run the following command (C:\install_path\xml.exe for Windows, xmlstarlet for Unix): xmlstarlet sel -e utf8 -t -m “//tu[.//hi/@type='lead']” -v “@tuid” -m “tuv” -o [...]