1. What do we annotate and why?
Unbabel translates content using a combination of machine translation (MT) and human editing. We translate and annotate tickets (customer support emails), chat conversations (between support agents and their clients) and various publishable contents (press releases, product catalogs, technical documentation, app content and so on).
In order to monitor the quality of the translations we provide and to train our own internal AI systems, we collect annotation data. During the annotation process, a linguist uses Unbabel’s Annotation Tool to carefully evaluate a translated text, identify errors and classify them according to their type and severity. The error typology in use at Unbabel is MQM-compliant.
2. The Annotation Tool
In this section, you will learn how to annotate in the Annotation Tool.
2.1. Getting started
- Open the Annotation Tool and click on ‘Sign in’.
- Enter your Unbabel username and password, and check the box saying ‘I’m not a robot’.
- You will see a list of batches (groups of jobs) that are available for annotation, each with information about language pair, creation date and progress status bar.
- To start annotating, click on a batch language pair, in blue. This will open all jobs for that batch and language pair.
-
To log out, go back to the dashboard, click on your username in the top right corner of the dashboard page, and then on ‘Logout’.
2.2. The Annotation Interface
The annotation interface consists of a top bar and the left, middle and right panels.
The top bar
On the top bar you will find:
- The ‘Back’ button. This button leads back to the Annotation Tool dashboard page. When you click on it, your browser will display a message regarding the changes made on the batch. Please make sure you don’t click on this button if you are annotating a unit and you have not clicked on the button ‘Add’ yet.
The Navigation bar, where all the batch jobs you are currently working on are represented by circles:
Empty dark blue circle: a job you’re currently working on; | |
Empty light blue circle : a job that has not yet been annotated | |
Half-filled blue circle: a job in progress; | |
Blue circle: a completed job; | |
Red circle: a reported job. |
When you hover your mouse over the circles, in addition to the job’s status information, you can find the job number and the number of words in the target text. For example, the following job is the 14th job in the batch, and the target text contains 307 words:
- Register information for the job you are currently annotating. If there is no information to display, this field will read ‘None’.
- Time spent on the batch. From the moment you open the annotation batch, the clock starts counting. Please note that there is an idle detector which will stop the clock if there is no activity on the batch for more than 5 minutes.
Left side panel
On the left side panel, you will find three drop-down menus: ‘Information’, ‘Annotations’ and ‘Proposed Corrections’.
- On the ‘Information’ menu, you will find the name of the error typology associated with the annotation task. The name of the annotation typology associated with these guidelines is unbabel_typology_v3.
- You will also see the Register information for the job you’re currently working on and the client’s instructions. Please read these last two fields carefully before starting a new job. Please note that the Annotation Tool doesn’t open this menu by default for every annotation job, so, if in the previous task the last menu opened was not ‘Information’, you will have to open it manually.
- On the ‘Annotations’ menu, you will find the list of the annotations already added on the current open job. Annotations can be sorted by: i) Position of the annotated unit in the target text; ii) Type of annotation error; iii) Severity of the annotated error. When annotations are sorted by ‘Type’, you will see the error categories in alphabetical order; when annotations are sorted by ‘Severity’, you will see the errors ordered by their severity.
* On the ‘Proposed Corrections’ menu, you will find a list of corrections suggested by you during the annotation process, sorted by their position in the source text. The head of the entries on the list is the selected source text in bold characters, followed by a comment (mandatory) and then by an em dash ‘━’ and the correction proposed.
While switching between jobs, the left side panel will show different menus by default, depending on the status of the job you are in. If the job that you open is a new job without any annotation on it, then the ‘Information’ menu will be shown automatically, so that you won’t miss any client’s instructions. If the job that you open is a job that you already worked on, then the ‘Annotations’ menu will be opened by default, so that you can review your work easily. You can always switch menus by expanding them.
Right side panel
Similar to the left side, on the right side panel you will find three drop-down menus: ‘Annotate’, ‘Propose Corrections’ and either ‘Finish’ or ‘Report’.
-
The ‘Annotate’ menu is where you classify the unit selected in the target text. This menu contains three fields: ‘Selection’, ‘Error Type’ and ‘Severity’.
-
Under ‘Selection’, the selected text string(s) will appear between quotes “ ”.
- Under ‘Error Type’, you will find a search box with the typology used to annotate the errors.
You can also use the search box to type the names of each error type, and they will appear in the list for you to select.
- After choosing the error type, you must select the severity of the error: Neutral, Minor, Major or Critical.
Finally, the ‘Clear’ and ‘Add’ buttons allow you either to erase the current selected unit (‘Clear’) or to add the annotation to the ‘Annotations’ list (‘Add’) on the left side panel.
For us to make full use of your annotations, we'll ask you to provide a correct translation, and only one, for each error you annotate, except for errors for which non-native speakers can easily infer the correct translation, such as Capitalization, Whitespace or Punctuation issues.
- In the ‘Propose Corrections’ menu, under ‘Source’, you will find the unit previously highlighted by you in the source text. You will also find two text boxes: ‘Proposed Translation’ and ‘Comment’, where you must write your suggestion for the translation of the selected unit, and a comment (mandatory) on why the correction should be made, or any other relevant information. Both boxes can be expanded by clicking and dragging their bottom right corner.
Finally, the ‘Clear’ and ‘Add’ buttons either erase the current correction proposed (‘Clear’) or they add it to the ‘Proposed Corrections’ menu (‘Add’) on the left side panel.
-
In the ‘Submit or Report’ menu, you will be able to either submit or report the task. The default option for every task is ‘Submit’. Under that option, in ‘Task fluency’, you will rate how natural the text sounds in the target language on a scale of 1 to 5 stars, and you can leave a comment in the ‘Task comment’ box if necessary. An annotation task can be submitted without comments, but not without its fluency rating. Important: once you submit a job, you will not be able to come back to it and edit your annotations.
In case you need to report the task, you will be asked to state the reason for reporting:
- 'Wrong Language Pair' - When the Language Pair is completely wrong, not the one you were expecting or work with.
- 'Mixed Languages' - When there's more than one language in the source text other than the expected one (i.e. the target language or any other).
- 'Other' - Please note that if you select this option you will have to specify the reason.
Don’t report annotations if you encounter any of the following situations:
- When a portion of text is blocked for annotation or if it appears in a light blue color, please ignore it and do not report based on that. Please visit: 3.3. Ignore certain parts of the job section.
- Company disclaimers or signatures - This should appear in light blue, however, sometimes this is not possible. Whether it appears or not highlighted in light blue, if you identify the portion of text as being a company disclaimer or signature, please do not annotate or report it. See the example below where the expected language pair is EN_ES (English to Spanish) and the signature and legal disclaimer are already translated in the source by default, without being highlighted in light blue:
Central Panel
The central panel can adjust its appearance according to the content type displayed (Chat or Tickets).
Chat
The chat conversations we translate take place between an agent, who always writes in English, and a client, who writes in their native language. Unbabel translates both directions of the exchange. For instance, in a conversation between an English agent and a German client, Unbabel translates both EN to DE (agent → client) and DE to EN (client → agent).
In the Annotation Tool you will see both sides of the exchange, displayed in two columns, with text contained in speech bubbles. The speech bubbles coming from the agent have pointers in the bottom right corner; the client’s, in the bottom left corner.
You will only be able to annotate the translation of the text coming from the agent, which appears on the right-hand side and is not grayed out. The original English text typed by the agent appears, grayed out, on the left (you’ll need it to assess the accuracy of the translation, but you won't be able to annotate it):
In addition to help you follow the conversation exchange, you will also see (but won’t be able to annotate, either) the text typed by the client (on the right) and its translation to English (on the left), both grayed out:
This is how a conversation —started by a client in Portuguese (top right)— appears:
Tickets
We translate tickets —written in English— into our customers’ clients’ native languages. In the Annotation Tool you will see the original English text displayed on the left and the translation on the right. You will only be able to annotate the translated text, which appears on the right-hand side.
This is how a ticket —written by the agent in English— appears:
For both Chats and Tickets, you can find four circles on the bottom right corner:
Annotation |
|
Text Highlights |
|
Markers |
|
Quality Estimation
|
By default, the top 3 features are toggled on and visible as highlights in the text. To toggle them off, just click on the circles. The Quality Estimation feature is toggled off by default.
- The ‘AN’ feature will highlight the annotations you made in the target text, if any. It can show three different colors: yellow for minor errors; orange for major errors; red for critical errors.
- The ‘TH’ feature will highlight glossary entries (in dark blue), anonymizations (in grey) and no-translates (in light blue; check the section Ignore certain parts of the job for more information about no-translates). Editors cannot edit anonymizations or no-translates, and you should not annotate no-translates.
-
The ‘MK’ feature will highlight the segments you’ve bookmarked (if any). To bookmark a segment, you click on
. If you are annotating a long job with a lot of segments, it’s useful to bookmark your last annotated/revised segment so you know where to pick up the annotation job. The bookmarked segment will show a blue icon
:
- The ‘QE’ (Quality estimation) feature, when toggled on, will highlight in red any text on the target side that our AI (Artificial Intelligence) system considers as representing potential issues in the translation:
⇒ AI is not perfect. Use the QE highlights to support your annotation process, but bear in mind that:
- Not all errors will be highlighted by QE
- Not everything that QE highlights is an error
- Some errors may be highlighted only partially by QE
It is therefore critical that you make sure your annotation process involves assessing the full text. One way of reducing potential biases induced by the QE feature is by enabling it at the end of the annotation process, as a last review step.
When a segment contains an annotation, a glossary term, an anonymization or a no-translate, the icon will show up above it. If you click on this icon, you can see all the items for that segment, one at the time, by clicking on the arrow (< >) icons. You can close the tooltip by pressing the 'ESC' button on your keyboard or by clicking the 'x' symbol on the tooltip.
There are, then, 4 types of items that are shown in the tooltip:
- ‘Error Annotation’, containing the name and severity of the error that is annotated, if any.
- ‘Glossary’, containing the following information:
- ‘target_term’: This shows what is present in the target text for the glossary term.
- ‘source_term’: This shows the original source term for which the glossary term is proposed.
- ‘expected_translation’: This shows the original translation of the ‘source_term’. This is included to see the difference between what was proposed and what is present in the text, in case the ‘target_term’ is modified in any way.
- ‘Anonymization’, containing the type of information that was anonymized. Values can be url, token, email, name, phone number, placeholder...
- ‘No-translate’,with no additional information.
2.3. How to annotate
Before starting to annotate, please:
- Check the register required by the client; this can be found on the left side panel and also next to the clock on the top right corner;
- Be aware of the content of our Language Guidelines for the language you’re annotating.
- Be familiar with the client style guidelines, if available. It's very important that your annotation process is informed by what's in them. Some of the client requirements can contradict conventional usages collected in our Language Guidelines.
- Read the source and target texts.
⇒ Only annotate genuine mistakes, and avoid individual preferences interfering with the annotation process. If the translation you're annotating and your preferred translation (how you would have translated it) both have equal merit and are correct, it is a preferential change that shouldn't be annotated. In other words, annotations should be performed on what is wrong and not what is actually preferential.
When you identify an error, please follow the instructions provided in the following sections.
What is an error?
An error is a specific instance of an issue that has been verified to be incorrect. The current error typology is divided into eight parent error categories: Accuracy, Linguistic Conventions, Terminology, Style, Design and Markup, Locale Conventions, Audience Appropriateness and Custom, each containing their own detailed subcategories.
Selecting an error
Select the problematic unit as you would on a word processor. Double-clicking a unit also works, but with some limitations (it only works with single units delimited by whitespaces or punctuation marks. A unit can be composed of one or more, or a combination of the following: word(s), number(s), whitespace(s), punctuation mark(s) and isolated character(s) (a symbol or a letter between whitespaces and/or punctuation marks).
The selected unit will appear in the Selection field on the Annotation Tool menu right away.
⇒ If you select a word and the whitespace before or after it, the tool will automatically expand the selection to cover the word before or after the whitespace.
Selecting an Omission or Punctuation error at the beginning or at the end of a segment
All segments contain an extra whitespace at the beginning and at the end, in case annotators need to annotate an Omission or Punctuation (missing punctuation) error.
⇒ To select this extra whitespace, click on the ‘+’ sign that appears when hovering the mouse over the target segment. This is specific to the beginning or the end of the segments. If you need to select a whitespace at the end of any other line of that segment, you can simply position the cursor after the last character and select the following whitespace.
Certain tags require or allow multi-selection (See Multiselection for details.) This means that in some cases you can/must select more than one unit separately and one after the other. To do so, select one unit and keep the ‘ALT’ key pressed while selecting the remaining units. Make sure that you include all the units that make up the error.
Annotating overlapping errors
You can annotate overlapping errors only in certain instances: specifically when one of the co-occurring labels is Word Order, Agreement or Source Issue (and, for Chinese and Japanese, also Omission, Punctuation or Whitespace).
See this sentence for example:
The quick fox bronw jumps over the lazy dog.
There are two errors in it: a Spelling error in brown and a Word Order error involving fox and brown. The correct annotations for this sentence would then be these two:
The quick fox [bronw]SPELLING jumps over the lazy dog.
The quick [fox] [bronw]WORD ORDER jumps over the lazy dog.
Note that bronw appears then in two different annotations. Annotate one error first, and then the other.
⇒ When there are two overlapping errors but neither error is a Word Order, Agreement or Source Issue, annotate the error with the highest severity. If both errors have the same severity, choose the first one alphabetically. An example of this would be the following sentence:
The Quik brown fox jumps over the lazy dog.
‘Quik’ has both a Spelling and a Capitalization error. As they can be both classified as minor issues, Capitalization should take precedence over Spelling because it comes first alphabetically:
The [Quik]CAPITALIZATION brown fox jumps over the lazy dog.
Classifying the type of error
After selecting the unit, you should classify the error using the ‘Error Type’ field, either by opening the main categories drop-down menus or by typing the error category name on the search text box.
Canceling an annotation in progress
In case you need to cancel an annotation you are currently working on, please use the ‘Clear’ button, under the ‘Severity’ field.
Proposing corrections
For us to make full use of your annotations, we ask you to provide a correct translation, and only one, for each error you annotate, along with a comment (mandatory) explaining the nature of the error and justifying the chosen severity.
You should do this for every unit you annotate, except for errors for which non-native speakers can easily infer the correct translation, such as Capitalization, Whitespace or Punctuation issues.
⇒ Please only provide proposed corrections for the units you annotated, instead of full-sentence corrections (unless expressly requested).
⇒ When you annotate a terminology error (either Term not Applied or Wrong Term), you're required to provide, in addition to a correct translation (in the “Proposed Translation” box), justification for your chosen terminology, for example a URL of where it appears, in the “Comment” box. Please also expand on any concerns you might have about the term choice you've made.
⇒ If you notice a recurring error, we ask that you annotate every instance, but you only need to provide a corrected translation and comment once (per job).
To propose a correction:
- In the source text, select the word or words associated with the error in the target text. In the example below, you’d have to select “Hello”, because it doesn’t match the register required by the client.
- The “Proposed Corrections” panel will open on the right, with the text you selected appearing automatically under “Source”.
- Type the proposed correction (which in this example would be “Guten Tag”) in the “Proposed Translation” box.
- Enter a brief description (in English) of the nature of the error in the "Comment" section, as you can see in the image below.
Added annotations
Click on the ‘Annotations’ menu on the left side panel to see and review all the annotated errors you have added to the current task. Please note that this panel is automatically opened when you select a unit for annotation.
⇒ Only add annotations for current, not hypothetical errors, that is: don’t tag as an error potential errors that would result from the correction of current annotations. For example, were an Omission error at the beginning of a sentence to be corrected, then the following word would have to be capitalized. But because the eventual capitalization error is not present in the original text, it should not be annotated.
Deleting added annotations
If you want to delete a unit from the ‘Annotations’ menu (on the left side panel), hover the mouse over the annotation you need to delete and click on the ‘x’ sign that appears on the upper right corner of the annotation.
Fluency rating
Fluency describes how natural a text sounds in the target language. You must rate Fluency in the ‘Task fluency’ section (on the right side panel) on a scale from 1 to 5 stars:
1 star * The text is of bad quality, and has severe accuracy, fluency and style problems and does not convey the meaning of the source text.
2 stars ** The translated text is of poor quality, and has major problems that may affect the accuracy, fluency and/or style, making it difficult to read.
3 stars *** The translated text is of fair quality, but contains some issues that particularly affect its fluency, causing the text to sound unnatural.
4 stars **** The translated text is of good quality. It is accurate and fluent, containing only minor errors that do not have much impact on the comprehension of the text.
5 stars ***** The translated text is perfect, with none or very few errors (e.g. an extra whitespace or a small spelling mistake).
If you don’t assign a Fluency rating, the job can’t be submitted.
Comment box
Below the Fluency rating there is a comment box (‘Task comment’). Please use it to leave comments regarding the overall quality of the translation, or some specific information regarding an error. Leaving such comments is optional, but highly encouraged.
Submit Annotations
When you’ve finished annotating a job, rated its fluency and optionally added comments in the comment box, you should submit it by clicking on the green ‘Submit’ button. Bear in mind that submitted jobs are no longer editable.
3. Annotation principles
⇒ The minimum unit that can be selected and annotated is a whole word, a whitespace, a punctuation mark or an isolated character.
In the following example, the version in French has an extra exclamation mark, so it’s necessary to annotate it as a Punctuation error:
[source - EN] Thank you very much.
[target - FR] Merci beaucoup!
Wrong selection → Merci [beaucoup!]PUNCTUATION
Correct selection → Merci beaucoup[!]PUNCTUATION
⇒ The maximum unit that can be selected and annotated is an entire incorrect segment.
If the issue occurs in a multiword expression, you will need to select the whole expression; if, for example, an entire sentence was translated and it shouldn't have been, you should select the entire sentence without the final punctuation mark.
In the following example, we have an Unnatural Flow error:
[source - EN] Hi, Mary here.
[target - ES] Hola, Mary aquí.
Wrong selection → Hola, [Mary aquí.]UNNATURAL FLOW
Correct selection → Hola, [Mary aquí]UNNATURAL FLOW.
3.1. Minimal Markup
You should select no more and no less than the problematic unit, with the exception of Inconsistency, Agreement, Word Order and some Punctuation errors (paired punctuation marks). This means, whenever possible, you should follow a minimal markup approach. Pay attention to the following:
- When annotating a unit, do not select any whitespaces or punctuation marks before or after the unit you want to select:
[Thankyou]WHITESPACE for your patience.
- When a portion of text is wrongly omitted in the target text, select the whitespace corresponding to the position where the omitted text should have appeared:
[source - EN] There is no limit as to how many transactions you can
receive in one day.
[target - PT-BR] Não há limite quanto ao número de transações
que[ ]OMISSIONpode receber em um dia.
⇒ In languages that don’t separate words with whitespaces, and when the next available character is not a whitespace (e.g. it’s a punctuation mark), you should select the unit that would immediately follow the missing text.
3.2. Multiselection
Sometimes errors involve more than one unit in the target text, and these units can be contiguous or not. For this reason, some issues require multiselection. These are:
Example: Please send me an [email], and attach your picture to
that [e-mail]INCONSISTENCY.
Example: [The user] of the app [ask]AGREEMENT for instructions.
Example: I’m sorry, we only have the [black] [color]WORD ORDER in stock.
Some other issues can be single or multiselection. An example of a multiselection Punctuation error is that of unpaired or mismatched quotation marks or brackets:
Example: Click on the [“]Start[‘]PUNCTUATION button.
⇒ When you come across one of the above multiselection errors, you will need to select all the units individually and, only then, you can choose the error category. To do so, right after selecting the first unit, keep the ‘ALT’ or ‘OPTION’ key pressed while selecting the other unit(s).
3.3. Ignore certain parts of the job
You should not annotate no-translates. These are bits that the agent blocked for translation and that haven’t been translated. They are shown in light blue in the Annotation Tool:
So, even if you see errors in one of the light blue highlights, ignore them.
4. What is the expected quality level of the texts I’m annotating?
When annotating, keep the following in mind:
- The translation must be a correct reflection of the source text.
- Spelling and punctuation should be correct.
- Grammar and syntax should be correct.
- Register recommendations should be followed.
- The translation must be fluent.
- Target language conventions outlined in the Language Guidelines should be followed.
- Client instructions and style guides, if present, take precedence over any language conventions and our Language Guidelines.
5. Error Typology
Our error typology is structured as follows: it has eight parent error categories (Accuracy, Linguistic Conventions, Terminology, Style, Locale Conventions, Audience Appropriateness, Design and Markup and Custom). Only end children are selectable in the Annotation Tool.
All error categories are visible in a dropdown list in the Annotation Tool.
⇒ When annotating, you can either click on each parent category and scroll down until you reach the error category you want to select, or you can start typing the name of the category you’re looking for, and the category will appear in the menu.
⇒ The Word Order and Agreement tags can be used in combination with any other error tags in this typology.
⇒ Source Issue must always be used in conjunction with any of the other error tags in this typology, to point out that the source of the error in the target text comes from issues in the source text. See Source Issue for more details.
See Annotating overlapping errors for more information.
5.1. ACCURACY
Accuracy errors occur when the target text does not convey in a precise manner the meaning of the source text, or when the translated unit does not fit in the context in which it appears.
Accuracy errors include differences in meaning between source and target; untranslated, missing or additional content; lexical miscollocations; MT Hallucinations and Named Entity issues. They are classified into the following categories:
5.1.1. Addition
Addition errors occur in two cases:
-
When the target text contains one or more words that are not present in the source (examples 1 and 2 in the table below) and this doesn’t improve the translation in any sense.
- When there are words that are present in the source but don’t need the equivalent to be present in the target (example 3).
⇒ Do not mark Addition errors for content where the translation still reflects the source content's intention and meaning. Bear in mind also that high quality translations allow for a degree of creativity, and you can see the translator added connective devices or rephrased the original content.
Examples:
Ex. | Source | Target with annotation | Reason |
1 | That way you can be sure that you were the one who made the changes. | Así puedes estar seguro de que fuiste tú quien hizo [todos]ADDITION los cambios. | Todos (meaning 'all' in Spanish) is not present in the source and it is incorrectly added in the target text. |
2 |
Our system will deduct from your available credits and not your card. |
Notre système effectuera un prélèvement sur vos crédits disponibles et non sur votre carte [de crédit]ADDITION. |
Crédit, not present in the source, is added incorrectly to the target text in French. |
3 | You can find all the fees concerning the card here. | [Tu]ADDITION podes encontrar aqui todas as taxas que se aplicam ao cartão. |
European Portuguese is a null-subject language: the subject pronoun is generally not required, and including it makes the sentence awkward. |
5.1.2. Mistranslation
Mistranslation errors occur when the target text contains incorrect translation choices. This can happen in the following situations:
- A word, multiword expression or a larger chunk of text in the target doesn't fully reflect the meaning of the source.
- A word or multiword expression sounds odd or unnatural in the target language, even if the meaning of the source text is conveyed.
⇒ When what sounds odd or unnatural is a chunk of text larger than a word or multiword expression, Unnatural Flow should be used.
⇒ Mistranslation should not be used to annotate errors on named entities or glossary entries.
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
It has to be done by the book. |
Il doit être fait [par le livre]MISTRANSLATION |
The word-for-word translation into French doesn’t work. |
2 |
Fingers crossed it will be resolved soon.🤞 |
MISTRANSLATION[عبرت الأصابع] سيتم حلها قريبًا.🤞 |
The word-for-word translation into Arabic doesn’t work. |
3 |
You can unmute by holding down the button. |
Puede [desilenciar]MISTRANSLATION manteniendo pulsado el botón |
Desilenciar doesn’t exist in Spanish. |
4 |
Ask Siri — say ‘Siri, whose account is this’? |
Pergunte a Siri - diga ‘Siri, de quem é este [relato]MISTRANSLATION’? |
The right translation for account in Portuguese would be conta, not relato. |
5 |
Could you elaborate on the issue? |
Kan du [skapa]MISTRANSLATION problemet? |
Elaborate here means explain, not create (translated as such in Swedish). |
6 |
Image of the serial number on the product (indicated as S/N) |
olevasta sarjanumero-kuvasta (ilmoitettu nimellä [sarjanumero]MISTRANSLATION) |
S/N should have been left untranslated in Finnish, as the serial number on the product is always indicated as S/N, regardless of the target language. |
7 |
Apparently you received a huge |
Al parecer, recibiste una camiseta [ingente]MISTRANSLATION en lugar de tu vestido. |
Ingente conveys the meaning of huge in the target, but it is not appropriate for that context and sounds unnatural in Spanish (enorme would fit better). |
Special case: mistranslations due to an unexpected language in the source text
⇒ Sometimes you will see that the source content contains portions of text in a language that is not the intended source language. In these cases, there can be mistranslations in the target text. Label these mistranslations as you normally would and use the Source Issue tag as well.
5.1.3. MT Hallucination
MT Hallucination errors can occur in the following three scenarios:
- The Machine Translation generates a completely different translation that has no relation with the source text; the translation can still sound fluent and natural without reading the source, but the meaning is completely different (examples 1 and 2 in the table below).
- The Machine Translation generates a chunk of repetitions in the target text (examples 3 and 4).
- The content is translated into gibberish: in other words, the machine generates an output made of non-words or repeated symbols (example 5).
Ex. |
Source |
Target with annotation |
Reason |
1 |
You can send us a follow-up email at this address [EMAIL]. |
[Hágame saber si tiene alguna otra pregunta]MT HALLUCINATION. |
The Spanish translation reads please let me know if you have any other questions and it’s grammatically correct and fluent, but it has no relation at all with the source. |
2 |
Please note, all refund requests are subject to approval by the relevant carrier in accordance with the shipping agreement which means it can be a monetary refund, voucher, or airline credits depending on their policy. |
[La ringraziamo per aver scelto URL-0]MT HALLUCINATION. |
The Italian translation means thanks for choosing URL-0 and it’s fluent and grammatical but with no relation at all to the source. |
3 |
Hello, how can I help you today? |
Olá, como posso [ajudar a ajudar a ajudar a ajudar a ajudar a ajudar a ajudar hoje]MT HALLUCINATION? |
The translation of help is repeated in the target text in Portuguese. |
4 |
At least, Mali-400MP, Adreno 320, PowerVR SGX544 or Nvidia Tegra 3. |
Pelo menos, [Cupão Cicero-Cicero, Adreno 320, PowerVR Cicero-Cicero or Cicero-Cicero]MT HALLUCINATION. |
The machine translation generates a chunk of repetitions in the target text in Portuguese. |
5 |
S/N (Serial Number): |
S ● / N ( シリアル番号 [aa**GO*a]MT HALLUCINATIONシリアル番号 ) : |
“aa**GO*a” is a non-word and it’s randomly generated in the Japanese translation. |
5.1.4. Omission
Omission errors occur when a unit or a bigger portion of text that should be present in the target text is not, and important meaning is lost due to this omission.
⇒ Do not mark Omission errors for content where the translation still reflects the source content's intention and meaning.
⇒ Omissions can happen in three different situations:
- When there’s a unit in the source text that’s missing in the target text (examples 1 and 2 in table below).
- When target language conventions require the insertion of an extra unit (example 3).
- When there’s a bigger portion of text missing in the target text (example 4).
Ex. |
Source |
Target with annotation |
Reason |
1 |
Again, let me advise you to check the policies of the airline before booking the tickets. |
[]OMISSION Mi permetta di consigliarle di controllare le politiche della compagnia aerea prima di prenotare i biglietti. |
The translation of Again is missing in the Italian version. |
2 |
Address Line 1: |
Dirección[]OMISSION 1: |
The translation of Line is missing in the Spanish version. |
3 |
We do not have much information on this. |
Nous ne disposons pas[ ]OMISSION beaucoup d'informations à ce sujet. |
The French sentence requires the preposition de (disposer de). |
4 |
Bonsoir ACME j'ai un gros souci avec un jouet que mon fils a eu pour son Noël, un des personnages ne fait pas du tout de sons, nous avons regardé sur l'application le son est bien mis, de plus quand on télécharge l'appli ça fait planter le store. |
Good evening ACME I have a big concern with a toy my son had for Christmas, one of the characters doesn't make any sounds at all, we watched on the app the sound is well put on,[ ]OMISSION |
The source string in the French version de plus quand on télécharge l'appli ça fait planter le store was completely omitted from the translation. |
How to label Omission issues
⇒ When there is an omission in the target text, select the whitespace corresponding to where the omitted unit should have been.
⇒ In languages that don’t use whitespaces or when the character that is available for selection is not a whitespace but something else, for example, a punctuation mark, select the unit that would immediately follow the omitted word(s).
⇒ If the omitted unit is at the beginning or at the end of the sentence, click on the ‘+’ sign to label the omission error (see Selecting an error).
⇒ Don’t use the Omission tag for omitted punctuation marks. All punctuation-related issues should be annotated with the Punctuation tag.
⇒ If a word is omitted to improve the fluency of the text, and the meaning is the same, then it’s not an error.
5.1.5. Untranslated
Untranslated errors occur when a word or phrase that should have been translated was left untranslated.
⇒ If what’s been left untranslated is a named entity, you should instead use the Wrong Named Entity tag.
⇒ If what’s been left untranslated is a glossary entry, you should instead use the Wrong Term tag.
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
I wish you a SWEET day!🍭🧁🍰 |
А пока, я желаю вам [SWEET]UNTRANSLATED дня! 🍭🧁🍰 |
SWEET was kept untranslated into Russian. |
2 |
How To Make Pizza Dough |
Comment faire de [Pizza Dough]UNTRANSLATED |
Pizza Dough is not a named entity and is untranslated in the French version. |
3 |
USB POWER/BANDWIDTH ISSUES |
PROBLEMY Z ZASILANIEM USB / [BANDWIDTH]UNTRANSLATED |
Bandwidth is untranslated in the target in Polish. |
5.1.6. Wrong Named Entity
At Unbabel, we consider the following expressions to be Named Entities (NE):
- People’s names (including surnames, aliases and usernames);
- Company, team and product names (including model specifications);
- Titles (including movies, songs, TV shows, books and other publications, art pieces...);
- Country, city and all sorts of location names;
- Email addresses and URLs;
- Numerical and alphanumerical entities (including currency and measurements, phone numbers, credit card numbers, passwords, reference codes…);
- Date and time expressions;
- Postal addresses.
The Wrong Named Entity tag is used to characterize the following problems, when affecting a NE:
- Spelling, whitespace and capitalization errors.
- Any other translation problem concerning a named entity, including mistranslated, untranslated, unnecessarily translated, wrongly transliterated, omitted or added named entities.
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
The coupon ACME35WZ65Z88RGJ can be redeemed immediately |
优惠券[ACME35Z65Z98TR]WRONG NAMED ENTITY可以立即兑换。 |
The coupon code in the Chinese translation is different to the original one. |
2 |
Dear Wiley, |
Gentile [Wilar]WRONG NAMED ENTITY, |
The name in the Italian version doesn’t match the original. |
Special considerations about errors found in named entities
⇒ Errors falling on a NE should be annotated either as Wrong Named Entity or as one of the Locale Conventions subcategories.
⇒ If an error falls on a NE that is also a glossary entry (it will appear highlighted in blue in the Annotation Tool), use one of the Terminology subcategories.
5.2. LINGUISTIC CONVENTIONS
Linguistic Conventions errors are related to the linguistic 'well-formedness' of the text, and can be assessed without regard to whether the text is a translation or not. Any error under this category implies that the target text is not linguistically and/or grammatically correct. These errors include capitalization, punctuation, spelling and grammar issues. They are classified into the following categories:
5.2.1. Agreement
Agreement errors occur when two or more words do not agree in case, number, gender, person or other morphological feature.
The minimal markup principle doesn’t apply to this error type, as you have to select both the (correct) reference(s) and the (incorrect) unit(s).
⇒ Agreement is a multiselection error. This means that, whenever you spot one, you should select first the unit(s) with the error and then the referent (that is, the correct unit). This will ensure Unbabel stores and uses your annotations correctly. For instance, in example 1 in the table below, you would select Obrigado first and then Anna.
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
[...] Thank you for reaching out to us. Anna |
[...] [Obrigado] por nos ter contactado. [Anna]AGREEMENT |
Obrigado disagrees in gender with Anna in Portuguese. Note: this is a single error, not two (see Multiselection). |
2 |
The screenshot sent is in EML format. |
Lo [screenshot] [inviate]AGREEMENT è in formato EML. |
Inviate disagrees in gender and number with screenshot in Italian. Note: this is a single error, not two (see Multiselection). |
3 |
Drag and drop the Docs icon to the Files folder. Once there, rename it. |
Faites glisser et déposez [l'icône] Docs dans le dossier Files. Une fois là, [renommez-le]AGREEMENT. |
le disagrees in gender with icône in French (it should be la instead). Note: this is a single error, not two (see Multiselection). |
5.2.2. Capitalization
Capitalization errors are related to the incorrect choice of letter case.
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
If you would like a refund for this charge, please let us know and we'll go from there. |
Si deseas un reembolso por este [Cobro]CAPITALIZATION, por favor, avísanos y actuaremos en consecuencia. |
Cobro should all be written in lowercase in the Spanish version. |
2 |
Hello, Thank you for writing back to us. |
Dobrý den, [Děkujeme]CAPITALIZATION, že jste nám napsal. |
After a greeting, Czech requires an initial lowercase letter. |
3 |
- Select First Time Installation. |
- Valitse [ensimmäisen]CAPITALIZATION kerran asennus. |
First Time Installation is a UI component, so the Finnish translation should have the first letter of the first word capitalized. |
5.2.3. Grammar
Grammar errors are related to the morphology or syntax of the text. They include the usage of the wrong part of speech, grammatical case and verb tense, mood, voice or aspect, wrong contractions and wrong function words.
⇒ While Agreement and Word Order errors are technically considered grammar errors, they should be annotated with their own respective tags.
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
After you have your first customers, you will need to go to Customers and click on the Customer name. |
Después de tener sus primeros clientes, tendrá que ir a Clientes y [haciendo]GRAMMAR clic en el nombre del cliente. |
The wrong verb mood (should be infinitive) makes the sentence ungrammatical in Spanish. |
2 |
Please feel free to reach us for any support with your ACME Products. |
Μη διστάσετε να επικοινωνήσετε μαζί μας για οποιαδήποτε υποστήριξη [με]GRAMMAR τα προϊόντα ACME. |
The use of the Greek preposition με is not correct in this context. |
3 |
I understand that you want to check in online. |
chápu, že se chcete [odbavení]GRAMMAR online. |
Wrong part of speech makes the sentence ungrammatical in Czech. |
5.2.4. Punctuation
Punctuation errors occur when a punctuation mark is used incorrectly or is missing from the translation.
⇒ Don’t use Omission or Addition to label extra or missing punctuation marks.
⇒ The Punctuation tag should also be used to annotate the misuse of hyphens (-) instead of, if applicable, en dashes (–) or em dashes(—) as bullet points or as parenthetical signs.
How to tag issues in paired punctuation marks
⇒ In the case of missing or partially missing paired punctuation marks, such as parentheses and quotation marks, select both the opening and closing marks (for example, the existing unpaired mark and the whitespace where the missing one should be placed).
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
Original copy of the Proof of Purchase or Invoice (not a screenshot): |
Cópia original do comprovante de compra ou nota fiscal (não uma captura de tela)[.]PUNCTUATION |
There’s a period instead of a colon in the Brazilian Portuguese version of this sentence. |
2 |
You can also search Google for your name (be sure to include the word Pinterest in your search). |
Вы также можете поискать свое имя в Google[ ]не забудьте включить слово Pinterest в свой поиск[)]PUNCTUATION. |
There’s a missing parenthesis in the sentence in Russian. Both opening and closing positions should be labeled as a single punctuation error. |
3 |
Have a great day! |
[ ]Que tengas un buen día[ ]PUNCTUATION |
Opening and closing exclamation marks are missing in the Spanish sentence. This should be labeled as a single punctuation error. |
4 |
• Purchase receipt |
[-]PUNCTUATIONإيصال الشراء |
There’s a hyphen instead of a bullet point in the Arabic translation. |
5.2.5. Spelling
Spelling errors are related to spelling of words, including diacritic errors and extra or omitted hyphens inside a single word.
⇒ If there is a hyphen instead of whitespace between two words, the error should be tagged as Whitespace.
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
This sort of damage is not covered under the warranty, but we will seek assistance from a higher support and see what we can do regarding this issue. |
Questo tipo di danno non è coperto dalla garanzia, ma chiederò comunque aiuto ai responsabili dell'assistenza per capire che cosa [zi]SPELLING può fare per quanto riguarda questo problema. |
There’s a typo in the sentence in Italian: the word zi should be si instead. |
2 |
From here, you should notice your buds boot back up and be ready for use! |
A partir [dai]SPELLING, você deve perceber seus fones ligarem de volta e prontos para uso! |
There should be a diacritical sign in the highlighted Portuguese word: daí. |
3 |
With regards to your query, I would like to inform you that your device is a Wi-Fi device. |
Con respecto a su consulta, me gustaría informarle de que su dispositivo es un dispositivo [Wi-Fi]SPELLING. |
Wifi must be written without a hyphen in the target text (in Spanish). |
5.2.6. Whitespace
Whitespace errors occur when one or more whitespaces are incorrectly added or omitted.
⇒ Please note that you should not use the Omission or Addition tags to label added or omitted whitespaces.
You should also use this tag when:
- Two words, neither of which is a named entity or a glossary entry (or a part of either), are separated by a whitespace, but they should instead be written as a single word, with or without a hyphen (example 1 in the table below).
- Two words, neither of which is a named entity or a glossary entry (or a part of either), are written as a single word, with or without a hyphen, but they should instead be separated by a whitespace (examples 2 and 3).
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
Try plugging it into a different USB port, just in case there's a problem with that port. |
Prøv at tilslutte den til en anden USB[ ]WHITESPACE port, bare i tilfælde af at der er et problem med den port. |
There should be a hyphen between USB and port in Danish instead of a whitespace. |
2 |
I have contacted our partner ACME to investigate the return of the order. |
Ich habe unseren Partner[-]WHITESPACE ACME kontaktiert, um die Rückgabe der Bestellung zu untersuchen. |
There should be a whitespace instead of a hyphen in the German target text. |
3 |
Purchase Invoice |
[Satınalma]WHITESPACE faturası |
Satınalma should be split in two words in Turkish: Satın alma |
5.2.7. Word Order
Word Order errors occur when the order of the words in the target text is incorrect.
⇒ Select words in the order that they should appear. For instance, in example 1 of the table below, you should select incrível first and then Canal de Youtube Shopify. This ensures we can store your annotations correctly.
⇒ Select two or three separate units (no less than two and no more than three) to tag this error (remember that each unit can contain one or more consecutive words).
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
You may also check our awesome ACME Youtube Channel for video tutorials and in-depth guides. |
Você também pode verificar nosso [Canal de YouTube ACME] [incrível]WORD ORDER para tutoriais de vídeo e guias em profundidade. |
The adjective incrível should be placed in Portuguese before Canal de YouTube ACME. |
2 |
Please don't worry, I can see that the boarding pass for the Milan-Catania flight is available. |
Non si preoccupi, vedo che la carta d'imbarco per il [Milan-Catania] [del]ADDITION [volo]WORD ORDER è disponibile. |
The correct word order in Italian would be(...) per il volo Milan-Catania. Note that del in this case is an Addition error and it is not part of the Word Order error. |
3 |
Please provide your full shipping address (Street, Apt/Unit #, City, State, Postal Code, Country). |
Proporcione su dirección de envío completa (Calle, [Apt /] [# de]WORD ORDER unidad, ciudad, estado, código postal, país). |
The correct word order in Spanish is (...) # de Apt/unidad. |
4 |
Original copy of the Proof of Purchase or Invoice. |
[De la preuve d’achat ou facture] originale [copie]WORD ORDER. |
The correct word order in French is Copie originale de la preuve d’achat ou facture. Note that originale doesn’t need to be tagged, as it should stay in the same position, and that you shouldn’t label any Capitalization error in the current word order. |
5.3. TERMINOLOGY
Just as any error that falls on a NE should be annotated as Wrong Named Entity or as one of the Locale Conventions subcategories, any error that falls on on any of Unbabel’s glossary entries (highlighted in blue in the Annotation Tool) should be tagged with one of the Terminology subcategories. The Terminology subcategories should ONLY be used to annotate glossary entries.
⇒ Examples of Terminology errors include a term not fitting in context, a term with a typo, and a term that is wrongly capitalized or inflected. For example, for the term ACME support Team, the correct tag would be Wrong Term (given that “support” should have an initial upper case).
⇒ Glossary entries can consist of more than one word. If the error only applies to one word, the whole glossary entry should be selected anyway.
⇒ Glossary entries can overlap with Named Entities, but remember that as long as it’s a glossary entry (highlighted in blue), the only errors that apply are those under Terminology.
⇒ if you find a mistranslation in a unit of text that should be a glossary entry , but it’s not (it’s not highlighted in blue), annotate it as a Mistranslation and propose it as a new glossary term (for more information see Right side panel). See glossary to judge whether something should be added to our glossaries.
⇒ A glossary term, just as any other issue in this typology, can be selected as part of a bigger Word Order or Agreement error. See Annotating overlapping errors for more details.
5.3.1. Term Not Applied
The Term Not Applied category must be used when the term present in the target text is not compliant with the one specified in the glossary.
⇒ Please note that you should ONLY use this category to annotate glossary entries.
Examples:
Ex. |
Source (with underlined glossary entry) |
Target with annotation |
Reason |
1 |
If you continue to have issues with any button on the controller after running through the steps and the gamepad test, please let us know. |
Si vous continuez à rencontrer des problèmes avec un bouton sur le contrôleur après avoir complété les étapes et le test de [gamepad]TERM NOT APPLIED, veuillez nous le faire savoir.. |
The glossary term gamepad was kept the same as it is written in the source language, while its correct translation in French would be manette. |
5.3.2. Wrong Term
The Wrong Term category should be applied when the term present in the target text is compliant with our termbase, but it doesn’t fit in context or there’s an error in it (it can be a typo, a capitalization issue or any grammatical error).
⇒ Please note that you should ONLY use this category to annotate glossary entries.
Examples:
Ex. |
Source (with underlined glossary entry) |
Target with annotation |
Reason |
1 |
Thank you for contacting ACME. |
Kiitos, että otit yhteyttä [ACME]WRONG TERM. |
ACME should be inflected in Finnish (ACME:hen). |
2 |
S/N (Serial Number): |
[серийный номер]WRONG TERM. (серийный номер): |
There is a glossary entry for the acronym S/N in Russian for this specific client, and it is the spelled out version серийный номер ('serial number'); which results, in this example, in duplicated text. |
3 |
* Full Shipping Address |
* [dirección de envío completa]WRONG TERM |
Dirección should be capitalized in the Spanish translation because it’s a list element. Note: the whole glossary entry should be highlighted. |
5.4. STYLE
Style errors are related to the fluency and natural readability of the target text. Any error under this category implies that the target text does not comply with the company style requirements, or uses inappropriate language style.
Style errors include register errors, inconsistency issues, the use of less natural or creative styles and also the disregard of customers’ specific style preferences and Do Not Translate instructions.
5.4.1. Company Style
Company Style errors occur when something in the target text is not compliant with the company-specific or organization-specific style guidelines or instructions.
⇒ This category must be used when:
- the Company or organization style guides or instructions differ from standard target language rules and conventions, which may or may not be included in our own Language Guidelines, or
- the target text shows non-compliance with client requirements and it cannot be annotated with any other error category.
⇒ This category should not be used to tag errors falling on glossary entries (highlighted in blue in the Annotation Tool), but instead, one of the Terminology subcategories should be used.
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
To start a search, make sure that the items listed below are set. |
Per iniciar una cerca, assegureu-vos que els ítems de la llista següent [han estat seleccionats]COMPANY STYLE. |
Company style guide requirement (Catalan): Avoid overusing the passive voice. |
2 |
This will affect 25% of the staff. |
Esto afectará al |
Company style guide requirement (Spanish): Don’t use a whitespace between a number and the percentage symbol. (This client instruction goes against the convention In Spanish of leaving a whitespace between the number and the percentage symbol.) |
3 |
If you are printing only one copy of a document, or if the collate function is disabled, no separation sheets will be inserted, regardless of the separation sheet setting. |
Si imprimiu només una còpia d'un document, o si desactiveu la funció de classificació, no s'inseriran [separadors, independentment de la configuració dels separador]COMPANY STYLE. |
Company style guide requirement (Catalan): Use a direct and simple writing style (for example, by using pronouns). |
4 |
Paper 80 gms / 20 lb |
Papír 80 g/m2 / [20 lb]COMPANY STYLE |
Company style guide requirement: Please put non-metric units in round brackets behind the metric version. (In Czech usually only metric units are used) |
5.4.2. Do Not Translate
Do Not Translate errors occur when a unit (a word, a multiword expression or a phrase) was translated but it should have been left untranslated according to the client’s preferences.
⇒ Do Not Translate errors are caused by non-compliance with client requirements regarding text that should have been left untranslated. These requirements, if present, are included in the client instructions and style guides.
⇒ Requirements to not translate certain units usually apply to marketing slogans, technical words, product names, placeholders and other words that should be kept in the original language.
⇒ Only use this tag when there’s a specific client instruction regarding the need to not translate certain words, phrases or expressions.
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
And it's what has made the phrase "Just Do It" synonymous with Nike. |
En het is wat de zin ["Doe het gewoon"]DO NOT TRANSLATE synoniem heeft gemaakt met Nike |
There are specific client instructions to not translate “Just Do It”, but they were not complied with in the target in Dutch. |
2 |
Once you have saved your settings, please go to [[FILE PATH]] |
Sobald Sie Ihre Einstellungen gespeichert haben, gehen Sie bitte zu [[DATEIPFAD]]DO NOT TRANSLATE |
There are specific client instructions to not translate anything contained in double brackets, but they were not complied with in the German version. |
5.4.3. Inconsistency
Inconsistency errors occur when the text shows internal translation inconsistency in the use of certain lexical units (words or multiword expressions).
⇒ The minimal markup principle doesn’t apply to this error type. You should select all elements that together make up the inconsistency, including any ‘correct’ ones. (For example, both lenken and linken in Example 1 of the table below should be selected, even if only one is causing the inconsistency.). The order in which you select them is not important.
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
Please click on this link. [...] This link will expire in 24 hours. |
Klikk på denne [lenken].[...]Denne [linken]INCONSISTENCY utløper om 24 timer. |
Both lenk and link are correct in Norwegian, but in the same document, only one should be used. Note: this is a single error, not two (see Multiselection). |
2 |
I have refinalized your release and sent it out to stores. [..] A random percentage of releases go through an internal store review process. |
He refinalizado tu [publicación] y la he enviado a las plataformas [...]. Un porcentaje aleatorio de [lanzamientos]INCONSISTENCY pasa por un proceso de revisión interno. |
Both publicación and lanzamientos can be correct in Spanish, but in the same document only one should be used. Note: this is a single error, not two (see Multiselection). |
3 |
|
|
Provided either the infinitive or the imperative could work in context in French, there is a consistency problem when using both. |
5.4.4. Lacks Creativity
The Lacks Creativity label must be used when the translated text is correct and a close and true reflection of the source content, but it lacks language creativity and flexibility, and is not appealing or engaging enough from a linguistic point of view.
⇒ This category can only be used in specific cases, when creativity is an express requirement of the customer.
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
I see it all. |
[すべてが視界に飛び込んでくる。]LACKS CREATIVITY |
A more creative translation in Japanese would be 眼下に世界がみるみる広がってゆ. This back translates to The world is expanding under my eyes. |
2 |
Handgefertigt für Sie |
[Handmade for you]LACKS CREATIVITY |
The translation from German is basic language, however it lacks a flourishing touch. A more creative translation would be Crafted by hand and heart |
3 |
Plus, enjoy free shipping on all orders. |
[Wir bieten]LACKS CREATIVITY Gratisversand für alle Bestellungen. |
A shorter and catchier translation in German would fit better in the target text. |
5.4.5. Register
Register errors occur when the text uses the wrong register, for instance informal expressions, pronouns or verb forms, when their formal counterparts are required.
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
Wishing you a great day ahead. |
Ich wünsche [Ihnen]REGISTER einen schönen Tag. |
The required register for the German translation is Informal but the pronoun Inhen is Formal. |
2 |
Warm Regards, |
[Cordiali saluti]REGISTER, |
A formal closing salutation in Italian has been used in a text that requires an Informal register. |
5.4.6. Unnatural Flow
Unnatural Flow errors cover situations where a portion of text, larger than a single word or multiword expression, is a too literal translation of the source. The meaning of the source comes through in the target, but the overall feeling of the translation is unnatural.
⇒ When what sounds odd or unnatural is a word or multiword expression, Mistranslation should be used.
⇒ In these errors, the spelling and grammar of the target text may be correct.
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
Zebras are ideal for animal matching |
[Zebras sind ideal, um bestimmte Tiere zu finden]UNNATURAL FLOW |
The German translation sounds too literal, it reads like a translation, using the verb finden (finding) as a translation for matching. The verb matching should be translated as detektieren (detect) to read as if it was originally written in the target language: Zebras sind ein ideales Beispiel zur Detektion von Wildtieren |
2 |
Improves viewing angles in applications where audience is below the display |
[Verbessert den Betrachtungswinkel in Anwendungen, bei denen sich das Publikum unterhalb des Bildschirms befindet]UNNATURAL FLOW |
The wording in German follows the English too much and is too literal. |
3 |
We had a chance to create a clearer global view of what was happening with these species, all through the lenses of a huge number of different cameras. |
[Wir hatten die Chance, durch die Linsen einer riesigen Anzahl von Kameras klar darzustellen, was mit diesen Tierarten passiert]UNNATURAL FLOW. |
The translation is too close to the EN source structure, sounding unnatural in German. A more fluent and natural translation would be Durch die Objektive einer gewaltigen Anzahl an Kameras bekamen wir die Chance, ein genaueres Bild von der weltweiten Situation dieser Tierarten zu erstellen. |
5.5. LOCALE CONVENTIONS
Locale convention errors violate locale-specific content or formatting requirements.
⇒ Company instructions can go against conventional language rules. This may happen more often when it comes to Locale Conventions. Remember client instructions always take precedence over any other considerations.
Locale convention errors can fall under the following categories:
5.5.1. Address Format
These errors occur when the address format looks inappropriate in the target text.
⇒ Select the whole address as an error (excluding the sender/recipient), even if the error is in just a portion of the address (see examples below).
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
Our address is 123 Desert Bluff St., Canyonlands, AC 12345, United States |
Notre adresse est [123 St. Desert Bluff, Canyonlands, AC 12345, United States]ADDRESS FORMAT |
St is misplaced in the version in French. |
2 |
You can also visit us at our store at 32 Roadrunner Street. |
Obiščete nas lahko tudi v naši trgovini na [32 ulici Roadrunner]ADDRESS FORMAT. |
The building number is expected to be placed after the street name in Slovenian (na ulici Roadrunner 32). |
5.5.2. Currency Format
This label should be applied when the currency format in the target text is not the one that’s expected for the specific locale, or goes against client instructions. This happens, for example, when the currency name or symbol appears before the currency amount, when they should appear after.
⇒ Please keep in mind that this doesn’t necessarily mean that the currency should be localized, as this feature must be specifically requested by our customers.
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
I'd like to offer you a discount code so you can enroll in this course for $11.99. |
Rád bych vám nabídl slevový kód, díky kterému se můžete do tohoto kurzu zapsat za [$11,99]CURRENCY FORMAT |
As per our Language Guidelines, Czech requires currency expressions to have a whitespace between the symbol and the amount. |
2 |
We refunded $5.00 to your ACME account. |
Abbiamo rimborsato [$5.00]CURRENCY FORMAT al suo account ACME. |
As per our Language Guidelines, the currency symbol in Italian must be written after the number, separated by a whitespace (5.00 $). |
5.5.3. Date/Time Format
This label should be used when the date or time format used in the target text is not appropriate.
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
The warranty is valid until 09-02-2024 |
Die Garantie ist gültig bis zum [09.02.2024]DATE/TIME FORMAT |
American English uses 'mm-dd-yyyy' as date format; while German uses 'dd.mm.yyyy'. |
2 |
We can confirm an order was placed on 4/29/20. |
注文が[4/29/20]DATE/TIME FORMAT年に発注されたことを確認いたしました |
As per our Language Guidelines, Japanese uses the ' yy/mm/dd' date format. |
3 |
Our support line is open Monday to Saturday 8 AM – 8 PM |
Onze ondersteuningslijn is open van maandag tot zaterdag [8 AM – 8 PM]DATE/TIME FORMAT |
As per our Language Guidelines, Dutch uses a 24-hour time format. |
5.5.4. Measurement Format
Measurement Format errors occur when there’s a format issue in any measurement unit different from currency.
⇒ Please keep in mind that this doesn’t necessarily mean that the measurement unit should be localized, as this feature must be specifically requested by our customers.
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
As part of the treatment, you should walk 5 km a day. |
Como parte del tratamiento, debería caminar [5 kmts]MEASUREMENT FORMAT al día. |
The correct symbol for kilometers in Spanish is km, and not kmts. |
2 |
3.5 cm |
[3.5 cm]MEASUREMENT FORMAT |
Contrary to English, Danish always uses a comma instead of a full stop to indicate decimals. |
5.5.5. Number Format
This category must be applied when there is an issue in the number format used in the target text.
⇒ Number Format applies to format errors in numbers that are not part of any of the other locale conventions in this section (Address, Date/Time,Currency, Measurement and Telephone Format).
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
Is your member number 332 443 - X? |
Est votre numéro de membre [332 443 // X]NUMBER FORMAT? |
There is no reason why the member number should have a different format in French. |
2 |
10% |
[10%]NUMBER FORMAT |
In Czech, there should be a space between the number and the % sign. |
3 |
40,000 users |
[40,000]NUMBER FORMAT Benutze |
In German, large numbers are divided backwards in groups of 3, and a full stop is used to indicate thousands. |
5.5.6. Telephone Format
This label must be used when there’s an issue in the way a telephone number is presented in a specific language.
Example:
Ex. |
Source |
Target with annotation |
Reason |
1 |
(211) 555-1234 |
[(211) 555-1234]TELEPHONE FORMAT |
In German, the expected format should use the group of digits separated by spaces: 211 555 1234 |
5.6. AUDIENCE APPROPRIATENESS
This category covers cases where there’s content in the translation that can be seen as unusual, invalid or inappropriate for the target audience or target locale, due to specific cultural or linguistic features. The result is a translation that is not tailored to its intended target audience or culture and can cause a sense of non belonging to the reader.
5.6.1. Culture-specific Reference
Culture-specific Reference errors cover cases where the target text contains a culture-specific reference that’s not appropriate or understandable to the intended target audience. An example of this is the use of jargon related to sports or other culture-specific features that are not necessarily understood in the environment of the target language.
⇒ This category does not cover issues that are due to the use of an unexpected language variety in the target text: the correct category for those cases is Wrong Language Variety.
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
The president's speech was a home run. |
Discursul președintelui a fost un [home run]CULTURE-SPECIFIC REFERENCE. |
The source text uses a metaphor from baseball, which would not make any sense to a Romanian audience. |
2 |
We don’t walk under ladders |
[Mes nevaikštome po kopėčiomis]CULTURE-SPECIFIC REFERENCE |
This superstition is unfamiliar to the Lithuanian reader, hence the translation does not convey the source content intent. |
5.6.2. Wrong Language Variety
Wrong Language Variety errors occur when the language variety used is not the one requested, such as using Brazilian Portuguese spelling or word choices in a European Portuguese text.
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
I would also suggest you to see the below video to know how to update the device using the ACME connect application. |
También le sugiero que vea el siguiente [video]WRONG LANGUAGE VARIETY para saber cómo actualizar el dispositivo utilizando la aplicación ACME Connect. |
The intended target variety is European Spanish, but the word video (with the main stress on the e) is the Latin American variant of vídeo. |
2 |
You can access the above by tapping the icon on the top left corner of the map screen. |
[Você]WRONG LANGUAGE VARIETY pode [acessar]WRONG LANGUAGE VARIETY o acima clicando no ícone no canto superior esquerdo da [tela]WRONG LANGUAGE VARIETY do mapa. |
The intended target variety is European Portuguese, but acessar (instead of aceder), tela (instead of ecrã) and você belong to the Brazilian Portuguese variety. |
5.7. DESIGN AND MARKUP
Design and Markup errors occur when there is a problem related to design aspects (vs. linguistic aspects) of the content.
5.7.1. Markup Tag
Markup Tag errors occur when there are incorrect markup tags or tag components in the target text. In a broad sense, this includes malformed HTML characters and emojis where the emoji format is different from the source text’s.
Examples:
Ex. |
Source |
Target with annotation |
Reason |
1 |
Shipping address - format: Prefecture/Region> City> Street. |
送付先住所[ ]形式:都道府県[& gt ;]MARKUP TAG市区町>村[& gt ;]MARKUP TAG番地 |
There are whitespaces inside the two escaped HTML characters that we can find in the Japanese translation. |
2 |
2. tap the settings icon ⚙️ |
2. Нажми на иконку настроек [⚙]MARKUP TAG |
There was a format change concerning the emoticon used in the source text. |
3 |
Warmly,☀️ |
С уважением, [☀]MARKUP TAG ️ |
There was a format change concerning the emoticon used in the source text. |
⇒ Well-formed escaped HTML characters, as those shown in the table below start with a & and end with a ; and contain a combination of numbers, letters and a hash symbol, with no space in between. They are not errors and should not be annotated.
⇒ The following are examples of the most common, correctly escaped HTML characters in our data, which are not errors and shouldn’t be annotated:
Well-formed HTML |
What it stands for |
[ | [ |
] | [ |
| | | |
< or < | < |
> or > | > |
" or " | " |
' or ' | ' |
& or & | & |
⇒ These are examples of malformed HTML codes that should be annotated (variants of the above examples, with whitespaces in between):
- & apos;
- & APOS ;
- & AMP;
- > ;
5.8. CUSTOM
This category covers any other issues that are not related to the above categories. Currently, it only contains one category, Source Issue.
5.8.1. Source Issue
This tag must be used always in combination with another error tag, and only when there’s an error in the target text and this is due to an issue in the source text. So, in this case, you should make two separate annotations: one as the actual error and one as Source Issue.
⇒ You should always assign Source Issue a Neutral severity.
⇒ The Source Issue label should also be used when part of the source text is written in the target language or in a different language, and the result is a mistranslation in the target (examples 4 and 5 in the table below).
Examples:
Ex. |
Source (expected in English) |
Target with annotation |
Reason |
1 |
Thank you for info! |
Danke für[ ]OMISSION/SOURCE ISSUE Info! |
There’s a missing determiner in German (die), because the is also missing in the source. So it should be annotated twice: once with Omission, and another with Source Issue. |
2 |
can you help me? |
[können]CAPITALIZATION/SOURCE ISSUE Sie mir helfen? |
können in German should be capitalized, but it is not. The reason is that the source can is not capitalized, but it should be. So it should be annotated twice: once with Capitalization, and another with Source Issue. |
3 |
Amzon sells books. |
[Amzon]WRONG NAMED ENTITY/SOURCE ISSUE säljer böcker.
|
The named entity Amazon is mistranslated because it’s misspelled in the source. So it should be annotated twice: once with Wrong Named Entity, and another with Source Issue. |
4 |
Giv mig et øjeblik til at tjekke |
Giv mig et [SyJEubPOL]MISTRANSLATION/SOURCE ISSUE til tjekke |
The target language (Danish) is present in the source text, where English was the expected language. The original meaning was Give me a moment to check, but the current target means Give me a SyJEubPOL to check. |
5 |
BARN› |
[LADA]MISTRANSLATION/SOURCE ISSUE› |
The target language, in this case Swedish, was present in the source text, when it should be written in English. The word barn exists in both languages, but with different meanings: while in Swedish it means ‘children’, in English it means ‘agricultural building’. For that reason, the word barn was wrongly translated in the target with the English meaning. |
6 |
We only restore the club which was deleted as of a one-time exception and we can only do it once per account and perversion of ACME. |
Restauramos apenas o clube que foi excluído como uma exceção única e só podemos fazer isso uma vez por conta e [perversão]MISTRANSLATION/SOURCE ISSUE do ACME. |
The Mistranslation issue in the target text (perversão instead of por versão) is caused by a missing whitespace in the source text, where perversion should have been splitted into per version. |
6. Severities
Every error category should be rated according to how severe its impact is on the translation. There are four different severities: minor, major, critical and neutral.
6.1. Minor
An error should be rated as minor if it doesn’t lead to a loss of meaning and it doesn’t confuse or mislead the user. It may, however, decrease the stylistic quality or fluency of the text, or make the content less appealing.
⇒ Minor errors are highlighted in yellow in the Annotation Tool.
Examples:
Ex |
Source |
Target with annotation |
Reason |
1 |
Hello, How are you? |
Dzień dobry[]PUNCTUATION, MINOR Jak się masz? |
There’s a missing comma in the Polish version. |
2 |
Dear Aliv, |
Kære [ALIV]WRONG NAMED ENTITY, MINOR, |
The name Aliv shouldn’t be fully uppercased in the Danish version. |
3 |
Tap on the Profile icon. |
Pulsa en el [ícono]WRONG LANGUAGE VARIETY, MINOR de Perfil. |
The word ícono belongs to the Latin American Language Variety, but the expected target here is European Spanish. |
6.2. Major
An issue should be annotated as major when any of the following conditions are met:
- The usability or understandability of the content is impacted but the content is not unfit for purpose. Important: Errors due to non-compliance with Company style requirements render it automatically unfit for purpose, so they should always be always assigned at least a Major severity.
- The content is difficult to understand but not impossible.
- The error appears in a visible or important part of the content.
⇒ Major errors are highlighted in orange in the Annotation Tool.
Examples:
Ex |
Source |
Target with annotation |
Reason |
1 |
I'm all ears if you need further assistance or clarifications: |
Soy todo [listo]MISTRANSLATION, MAJOR si necesitas más ayuda o aclaraciones: |
There is a mistranslation in Spanish (ears as listo) that impacts the understandability of the sentence, even though the intended meaning remains clear. |
2 |
You are one of our valuable customers. |
Vous êtes un de nos [évaluables]MISTRANSLATION, MAJOR clients. |
Estimés or précieux would be a correct translation for valuable here. |
3 |
I'm Anna, your ACME Support Advisor for today. |
Sono Anna, il [suo]REGISTER, MAJOR consulente Assistenza ACME per oggi. |
The use of the formal pronoun suo in Italian is considered a major error, because it disregards the client’s specifications about the required use of the informal register. |
6.3. Critical
An error should be assigned critical severity if one or more of the following conditions are met:
- It severely changes the meaning of the original text.
- The reader cannot recover the actual meaning of the original text.
- It carries health, safety, legal or financial implications to the end user/reader.
- It damages the company’s reputation, violates geopolitical usage guidelines, causes the application to crash or negatively modifies/misrepresents the functionality of the product or service.
- It can be offensive towards an individual or a group (a religion, race, gender, etc.).
Common examples of critical errors include:
- Mistranslations, when the meaning is severely compromised.
- Omission, when a big portion of the text was omitted or any essential unit is missing from the target text.
- Wrong Named Entity, when a named entity is completely mistranslated in the target text.
- Untranslated, when it hinders comprehension.
- MT Hallucination.
- Spelling, when a typo completely changes the meaning of the translation.
⇒ Critical errors are highlighted in red in the Annotation Tool.
Examples:
Ex |
Source |
Target with annotation |
Reason |
1 |
Please do not switch off the device. |
Vă rugăm să[ ]OMISSION, CRITICAL opriți dispozitivul. |
Not (nu) is missing in the Romanian version, with the translation having the opposite intended meaning. |
2 |
ACMEFitQuest is currently available. |
[ACMEFitQuirk]WRONG NAMED ENTITY, CRITICAL je v současné době k dispozici. |
The product name ACMEFitQuest has been wrongly translated into ACMEFitQuirk in Czech. This inhibits comprehension and could harm the company’s reputation. |
3 |
This information is solely used for the process of authenticating you and will not be shared with anyone else. |
Cette information est uniquement utilisée pour le processus d'authentification [mais il est totalement hors de notre contrôle]MT HALLUCINATION, CRITICAL |
The second part of the sentence in French doesn’t have any relation with the source text and is confusing to the reader. |
6.4. Neutral
This severity degree is reserved at the moment only for the Source Issue category. A Neutral severity ensures that an error that is due to an issue in the source text is not penalized twice.
⇒ Neutral highlights show in green in the Annotation Tool
7. Glossary
Adjective
A word class that typically serves as a modifier of a noun, to denote a quality of the person or object named, to indicate its quantity or extent, or to specify a person or object as distinct from something else.
Annotation
The process of accurate identification and labeling of the errors found in the target text.
Case
In some languages, case is the way in which different words, typically nouns, adjectives, determiners and pronouns, change form depending on their function in the sentence. For example, the third person singular feminine pronoun takes the form of she when acting as subject (for example, She is happy ), and the form of her when acting as an object (for example, It is for her ).
Child(ren) and Parent categories
Broadly speaking, the child-parent relationship is that of a generic term or entity (parent) and a specific instance of it (child). A child has a “type-of” relationship with its parent: saying, for example, “Addition is a child category of Accuracy (errors)” is the same as saying “Addition is a type of Accuracy (errors)”.
Determiner
A word, phrase, or affix that occurs together with a noun or noun phrase and serves to express the reference of that noun or noun phrase in the context.
Error
A specific instance of an issue that has been verified to be incorrect.
Glossary
Unbabel handles company terminology via glossaries. These are terms that are important for our clients. Currently only nouns are part of the glossaries. Glossary entries appear highlighted in blue in the Annotation Tool.
Language variety
At Unbabel, a language variety can be defined as a particular choice of a language's phonemes, morphemes, structures and words that is related to a specific region (e.g. British English vs. American English).
Lexical miscollocation
The improper selection of one or more units when combined with other specific units in the target text. For example, the unexpected use of little instead of small in a little size dress.
Multiword expression
A multiword expression is an expression made up of two or more words that is perceived as a semantic unit. Idioms, compound nouns and verbal locutions can be considered as multiword expressions.
Named Entity
Broadly speaking, a named entity is an entity that can be called by a specific name, as it happens in the case of proper names and quantities of interest. At Unbabel, we consider the following expressions to be named entities:
- People’s names (including surnames, aliases and usernames);
- Company, team and product names (including model specifications);
- Titles (including movies, songs, TV shows, books and other publications, art pieces...);
- Country, city and all sorts of location names;
- Email addresses and URLs;
- Numerical and alphanumerical entities (including currency and measurements, phone numbers, credit card numbers, passwords, reference codes…);
- Date and time expressions;
- Postal addresses.
Part of Speech (POS)
A category of words (or, more generally, of lexical items) that have similar grammatical properties. Words that are assigned to the same part of speech generally display similar syntactic behavior - they play similar roles within the grammatical structure of sentences - and sometimes similar morphology in that they undergo inflection for similar properties. Nouns, pronouns, verbs, prepositions are some examples of parts of speech.
Phrase
A phrase is a group of words that does not have a subject and verb and it functions as a constituent in the syntax of a sentence. Examples of phrases are: more specific, out the window, top to bottom.
Preposition
A word that exists to express spatial or temporal relations or mark various semantic roles.
Pronoun
A word that is used in place of a noun or noun phrase.
Segment
Each one of the text portions in which the target text is divided, separated by a blank line above and a blank line below. This is the maximum unit that can be annotated.
Source text
The original text that is to be translated into another language. It’s also called just source.
Tag
Each of the names used to identify errors as described in these guidelines. Used interchangeably with label.
Target text
The translated text, also known as just target.
Unit
One or more, or a combination of the following: word(s), number(s), whitespace(s), punctuation mark(s).
Verb
A word (part of speech) that in syntax conveys an action (bring, read, walk, run, learn), an occurrence (happen, become), or a state of being (be, exist, stand).
Word
The smallest element that can be uttered in isolation with an objective or practical meaning.
Comments
0 comments
Article is closed for comments.