Edit distance

Edit distance (ED) is a useful measurement that we use to evaluate the quality of our MT engines.

Simply put, edit distance is the difference between the raw MT and the final translation, and represents the effort the translator has to make to convert the raw MT to human-quality translation. A low ED means that the linguist had to make fewer edits to reach the final result.

At Sandberg, we aim for a 30% ED for all our MT engines, which translates to approximately double the productivity of having no MT or TM leverage. We harvest all content from our memoQ server and process it for edit distance stats. We evaluate which engines are performing well and which are not, and set up regular feedback tasks for linguists to provide data that we can use to improve the engines.

Edit distance Potential increase in productivity Potential MTPE rate
<25% over 3 times 65-70% of full word rate
25% 3 times70-85% of full word rate
30% 2 times 85-90% of full word rate
40% not useful

That said, edit distance is a not an exact measurement and many factors influence the ED%:

  • If the MT has been over-edited by a zealous post-editor, the ED will be higher than necessary. Conversely, a post-editor may also be prone to under-editing, which may give a lower ED but may also be of lower quality. For that reason, ED is only reliable when comparing post-edited MT performed by linguists who are reasonably experienced.
  • Even when edits are few and the ED is low, ED does not provide a value of how long it actually took the linguist to reach the conclusion of how many edits were needed. An inexperienced post-editor may produce results where the ED does not differ from a more experienced post-editor, but spend a lot more time on the text.
  • While 30% ED is our general target, different languages and language pairs will give different results. For DE-EN, 35% may be a fairly good ED% but for EN-SV it would be rather bad.
mt/mtpe/edit_distance.txt · Last modified: 2024/01/17 11:49
CC Attribution-Share Alike 4.0 International