Comma-separated values (CSV) files are delimited text files that, most typically, use a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields (or columns), separated by commas.
CSV files are used to save and transfer structured information in a simple, easy to read manner.
Unbabel filter specifications
When handling a CSV file, the Unbabel filter will define which content to translate and which to leave out, according to some rules. The basic are represented in this image:
Below you can find the most significant rules of the filter:
- Commas - and only commas - are assumed to separate columns unless escaped inside "" quotation marks.
- Quoation marks "" are used to qualify text as a single column/field.
- We recognize, extract and translate content from all columns and rows.
- Spaces are preserved in the file but are trimmed when the file is split or read by most softwares
- Can use \t, an \n.
- Treats \\ as characters and \uXXXX as UTF-8 encoded characters.
The following commands will act as placeholders, effectively blocking the content within from being translated. Placeholders and their respective text are displayed to the editors if the column/field contains more content, but can't be changed. They can be moved within the sentence though, in order to allow for syntax correction.
If a field only contains a placeholder, it is not displayed nor translated.
The following character combinations will work as placeholders (capitalization is required when present):
All content surrounded with single angle brackets is considered HTML by the filter and is removed from all steps of the translation. Ex: I am <b>sending</b> this for translation -> I am sending this for translation.
- Avoid using <> on anything other than HTML. This will deprive both our MT model and human editors from the content inside the brackets, which will compromise the translation.
- Make sure to use proper qualifiers to escape content you don't want to be broken into different columns
- If you're escaping a sentence with a comma and your qualifiers are not at the beginning and end of the field, the comma will break it into two fields. Ex: This is a "strange, yet true" statement represents two fields -> This is a strange and yet true. If you want to consider it a single field, send "This is a "strange, yet true" statement" instead.
- Some softwares will escape content again when creating a CSV. For example, Microsoft Excel will escape content between commas and then again at beginning and end of the cell, effectively triple-escaping, if any quotation marks are used.
- The output CSV will preserve the number of white spaces in the source, however, when read by certain software, spaces will be trimmed: any spaces at the end of beginning of a column are removed, and multiple spaces in succession inside qualified text are trimmed to 1.
Download the attachment for a template on a valid CSV file.
Please sign in to leave a comment.