Elephants never forget

What is a translation memory?

To give a basic definition, the TM is a file to which the CAT tool saves each individual segment and its translation. But that basic definition raises further questions, so let’s take a step back.

What is a CAT tool?

CAT stands for Computer-Assisted Translation. So a CAT tool is software that can help translators do their jobs.

The translator or project manager can import a source file into a CAT tool which will then divide it up into smaller chunks, referred to as segments. At its most basic, a segment is a sentence; but an astute user can configure their CAT tool so that it creates divides in more suitable places.

This might mean helping the CAT tool understand when a full stop is not the end of a sentence. To give a very simplistic example, “Mr. Smith” or “U.S.A.” would ideally not be split up by segmentation.

Typically, a user will see the source and target texts aligned on the left and right respectively. But that is only one way in which a CAT tool helps a translator.

The main reason CAT tools can be useful for translators is the subject of this article: it’s their translation memory (TM).

Why is a translation memory useful?

TMs are particularly useful in repetitive texts. Once a translator has already confirmed their translation of an individual segment (sent it to the TM), upon arriving at an identical or very similar text the CAT tool will display the previous translation of that segment, giving the user the option to reuse it. This can generate excellent consistency throughout the text or across several similar texts.

Naturally, this is more effective for certain texts (and, indeed, text types). Manuals jump out as an obvious example where a text is likely to require this level of repetition and therefore consistency.

Fuzzy matches

I mentioned that the CAT tool will not only display identical TM matches, but also similar ones. These are called fuzzy matches and are assigned a percentage based on:

  • Differences in the words or punctuation used
  • Punctuation
  • Formatting
  • Surrounding text (or rather, surrounding segments)

For a bit of fun, I input some Eros Ramazzotti lyrics into my CAT tool (see below). Taking my translation with a pinch of salt (I did it quickly for an example here, no real thought went in!), you can see that although some of these segments have been repeated word for word, certain elements of formatting affect how well they match.

Screenshot taken from SDL Trados
Screenshot taken from SDL Trados

The “CM” refers to a context match, showing that not only is this a 100% match but the previous segment is also a 100% match. I believe some CAT tools refer to this as a 101% match. The fact that the CAT tool confirms this automatically (a setting that can be changed) is something I will address below.

As a TM is a file (extension .tmx) it is possible to reuse it across projects. That can produce excellent results in terms of consistency.

Are there any dangers or criticisms of TMs?

Of course there are. While consistency is a key benefit to TM usage, it can also lead to some severe pitfalls.

CAT tools don’t have a full grasp on context

What if the target text required the user to name an object or person specifically in the translation, where only a pronoun was required in the source? Even if the CAT tool picks up a so-called “context match”, the context may dictate that a slight change needs to be made in the target.

Used for discounting rates

Another related CAT tool functionality is the ability to analyse a source file based on internal repetitions and fuzzy matches, as well as against those contained in an existing TM. As mentioned, this can result in excellent cross-file consistency, but agencies typically ask for discounted rates for high-percentage matches.

Screenshot taken from SDL Trados

There is much discussion within the industry over the fairness of this for vendors and the effect on quality of work. When considered in conjunction with the previous point, it could result in either inconsistent translation (the translator is not paid for the supposedly repeated segments and so does not check them for accuracy) or an underpaid vendor (the translator does look at them because they seek the best quality, but receives no compensation for the extra work).

Repeated mistakes

Consistency is only a positive attribute if the original quality is good. If mistakes are made in the first version of a repeated segment, they’ll simply be propagated throughout the text or texts.

Do I use CAT tools and TMs?

The simple answer is “yes”! Many of my clients ask me to use a certain CAT tool. I learned how to use the software from the big names in the industry during my Masters Course at Leeds and continue to undertake online training where required, advised and available.

The main CAT tools I use currently are SDL Trados Studio 2019 (soon to be updated to 2021) and Memsource.

And that’s TMs in a very small nutshell. There is much more to be said, both from a technical and technological perspective and in terms of the debate surrounding the ethical implications (the pay) for translators. It was not my aim to go into any depth on that here today, but to give those unfamiliar with the industry an insight into a key component.

But that doesn’t mean this shouldn’t be a conversation starter: contact me directly through my email address or contact page, share this article, or comment below with your views.