Machine translation post-editing
Quality expectations
Post-editing (PE) or machine translation post-editing (MTPE) is the process of reviewing machine translated content and editing it in such a way that it meets the requirements of the client.
There are generally two levels of post-editing that are used for different purposes. Full post-editing is the more common one, where the goal is to make the most of the MT output, and at the same time, to make the translation linguistically correct, stylistically good, terminologically accurate, and consistent. In other words, MT is used as a productivity booster and the final product should be equal to human translation in quality. Light post-editing is used for low-visibility texts where the goal is to understand the meaning of the text, but the target text doesn't need to be polished or even completely error-free. Light post-editing is cheaper and less time-consuming, and a post-editor can be expected to go through more text compared with full post-editing.
MTPE training
MTPE is a slightly different skill to master compared with normal translation, and therefore we recommend undertaking some training to help you become a more efficient post-editor. RWS offers a free course on MTPE which we recommend for anyone new to post-editing. It has good content for basic MTPE training to help you build on your post-editing skills and only takes about an hour to complete. The course requires registration, but is free once you have made an account.
Speed and compensation
The speed a linguist can carry out post-editing is directly linked to the quality of the raw MT output, the type of post-editing requested (full vs. light), and the post-editor's experience.
For example, when used as a productivity boost in a full MTPE workflow, a linguist who translates on average 250 words/hour without MT may be expected to reach 300-400 words/hour with MT, provided that the MT engine is well trained and produces good quality output. For light MTPE projects the expectation varies depending on what sort of issues should be fixed vs. ignored in the final product, but the speed may be somewhere around 600-1000 words/hour. In both cases, the post-editing speed is linked to the discount applied to MTed segments.
In a CAT tool, the most common way to handle MT output is to pretranslate the files before they are sent for translation, populating segments in 75%-100% match categories from the translation memory and segments in 0%-74% match categories with machine translation. Therefore, if there is a lot of TM leverage, there might not be many MTed segments, and a project with little TM leverage might be almost entirely machine translated. The MT discount is only be applied to the match categories that are normally considered “no match/new words”, and everything else is compensated with the usual TM fuzzy match categories.
The exact discount for the MTed segments depends on the MT engine used, and can therefore vary depending on the client, project, and language. The discount is related to the potential productivity increase (see Edit distance for more details) and should therefore reflect the quality of the MT engine output. There might be some segments where the MT is not helpful at all, but the figure is based on the expectation as a whole. At Sandberg, before we agree to a specific MT discount for a given account, we generally ask to see an example text with the output so that we can fully evaluate and provide feedback on it.
However, it may sometimes happen that the MT output and discount for a given project do not match the expected productivity increase. This can happen for a few reasons; the MT engine has been trained on a completely different domain or text type, the source files were poorly prepared for a CAT tool, or the incorrect MT engine was used, for example, and the issues aren't noticed until the project is already in translation. For this reason, if you consistently find that big parts of the MT output are not usable and you need to make a lot of edits, you should report this with examples to your project manager. Sometimes you might be specifically asked to evaluate the quality of the raw MT output in the beginning of a large project, so that discounts and delivery times can be discussed and adjusted, in case the quality turns out to be poorer than expected.
Post-editing in practice
MTPE vs. translation
Post-editing is a very different process from translation and might be closer to revision. The below table outlines the central differences.
| POST-EDITING MT | TRANSLATING WITH HIGH TM FUZZY | TRANSLATING FROM SCRATCH |
|---|---|---|
| Read the target segment (raw MT) | Read the source segment | Read the source segment |
| Now read the source segment | Read a TM suggestion | Start translating in your head |
| Ask yourself whether the meaning is the same | Start translating in your head | Translate |
| Ask yourself whether the things you want to change are real mistakes or preferential changes | Ask yourself whether the difference between the translation in your head and TM fuzzy match is relevant | Check your work |
| Edit raw MT as required or start translating from scratch | Edit the TM suggestion or start translating from scratch | |
| Check your work | Check your work |
Common errors in MT output
Raw MT output is seldom completely flawless, and there are certain error types that occur even in generally fluent MT output. Knowing what kinds of issues to expect can help you to become a more efficient post-editor.
- Incorrectly placed or missing tags
- Incorrect spacing, especially between numerical values and units of measurement
- Capitalisation
- Hyphenation
- Spelling
- Words omitted
- Words added
- Words untranslated
- Awkward sentence structure
- Inconsistent terminology, especially if there is a glossary to be followed. For example, you may have a term in the source text, and three different terms occurring in MT output in the target segments. It is your task as the post-editor to select the correct term and apply it consistently across the entire project.
Common post-editing errors to avoid
Becoming an efficient post-editor takes some practice, but the below list provides some examples of things to be on the lookout for in your own work. The types of common errors introduced during post-editing have to do with editing existing text to fit a new context, and might therefore happen even if you wouldn't normally make the same kinds of mistakes when translating something from scratch, without any MT output.
| MTPE error type | Explanation |
|---|---|
| Unedited TM fuzzy matches | Errors where different terms/opposite meaning/different numbers are not edited properly. This is unlikely to come from MT engine; it is post-editors who make the error of accepting unedited TM fuzzy matches. |
| Inconsistently translated terms | The MT engine does not produce consistent translations of terms and it does not communicate with project's TM or TB. Remember to check segments surrounding the one you are editing, including locked content, for context; and to use concordance with TM and TB to the extent that you normally would. |
| Translated Do Not Translate Words (DNTs) | The MT engine has no recognition of Do Not Translate Words (unless it is specifically trained to do so). This is why any company or product names that include nouns or verbs become machine translated. It is a serious mistake if a post-editor does not revert DNTs back to the source. |
| Mistranslations and false friends | The MT engine does not know which of polysemous words is correct for the context, it just uses those words and phrases that are statistically the most frequent in the engine corpus. Mistranslated segments may, at first glance, seem like perfectly smooth translations. It is your job to identify and correct such target segments that do not render the meaning included in the source segments. |
| Unnoticed untranslated words, omitted words, added words | If an MT engine comes across a word that is not part of its corpus or is too complex to machine translate, it either leaves it untranslated, totally omits it, or adds an extra nonsensical wording - no linguist would do that. |
| Acronyms incorrectly rendered in target | If an acronym is not part of the engine corpus, it will be incorrectly rendered during machine translation. You will need to take care of the acronyms in the target segments, i.e.: spell out the meaning of an acronym or use a different one in your language. |
| Wrong spelling | The MT engine rarely would use a correctly spelt word in the wrong context, whereas post-editors would (e.g. from/form). Watch out for typos while post-editing. Run a spell check. |
| Grammar mistakes | Incorrect word order and gender incongruence tends to happen more frequently during MTPE. Pay particular attention to the grammar errors reported during spellcheck. MS Word spellchecker is recommended because it seems to be better at picking up grammatical errors. |
| Under-edited content | Always read through the translation in its entirety before submitting it. Machine translated content includes false friends and spacing issues - take it for granted and be vigilant. Set OA settings to pick up typos, duplicate words, and trailing spaces. Watch out for term consistency. Usually, the content you are post-editing should be of the same high quality as human translation. |
| Over-edited content | Avoid introducing preferential changes - you are risking introducing inconsistent translations and wasting your time. Just follow client-specific instructions and consult project's TM and TB. |