Like most industries, the translation business is full of jargon, acronyms, and expressions that are weird and confusing to lay people.

But few concepts are as confusing to newcomers as translation memory and glossary. If I had a dime for every time these terms where used incorrectly (usually meaning the other one), then I wouldn’t be writing this blog today. So, here is a look at some of the differences between translation memories and terminology glossaries.

What is translation memory?

A translation memory (abbreviated as “TM”) is a database of pairs of translation segments (source language and target language). A segment is usually a sentence but can also be a sub-heading, a table cell, or other text block.

In contrast to machine translation, a translation memory is not an automated translation tool: it does not have any linguistic “knowledge”. TMs made up of previously-translated material. Essentially, TMs become more useful as more segments are added to it.

How does a TM work?

When a linguist is translating a document, they will have a TM open. If there is no content in a TM (such as for a new client), they will open a blank TM. This enables them to create a kind of file called an “unclean” or “bilingual” file, which contains the source text and the translation. Once the translation is complete, the unclean file can be “cleaned” into a TM, which removes the English (leaving just the translation), and populating the TM with the pairs of segments. Doing this over and over again with translated files makes the TM larger and more useful.

TMs work on a segment-by-segment basis. The TM application looks to see if there is a match between a source segment in the text to be translated and a source segment that was previously stored in the TM database. If there is a match, then it presents this information in a log file (this is called an “analysis”). The number of matches it finds is called “leveraging”.

An exact match is called a 100% match. Segments that almost match are called fuzzy matches and range from a less than 75% match all the way to a 100% match. Segments that are brand new are called no matches. If a segment is repeated more in the file(s) for translation than once it is called a repetition. Translation Memories are able to analyze multiple files at once in order to gain leveraging across a whole project; not just one file.

While working on a file, the translator can then accept a translation, replace it with a new one, or edit it to match the source. Some “high fuzzy” matches (i.e., those with a 95-99% match level) will require only a little editing, whereas no matches will require a brand new translation.

Industry standards for charging for TM matches

There are some informal standards on how clients get charged for repeated text, 100% matches and fuzzy matches. Usually, repetitions and 100% matches are charged at a rate of 10% of the full per-word rate. This ensures that a linguist reviews this text (important because even though the leveraged text might not change, the context around it could have changed).

For fuzzy matches the charge to the client depends on the degree of the fuzzy match: high fuzzy matches (95%-99%) are usually charged at a lower percentage than low fuzzy matches (75%-85%). No matches are charged at the full per-word rate.

What is the difference between a TM and a glossary?

While TMs work on a sentence-by-sentence level, glossaries are list of individual terms or short expressions. Here is an example:

English French
Accident accident
Accident Investigations Enquêtes sur les accidents
Accident Management Gestion des accidents

A glossary is independent of a TM. Some device and pharmaceutical companies maintain terminology glossaries by product line, by department, by division, or company wide. Sometimes they are short (50 entries) but they can run into the hundreds of entries.

One of the big differences is in what gets included. TMs usually include anything and everything translated – the more, the better.

Glossaries, on the other hand, should be built selectively. For example, clients with software applications sometimes think that the glossary should contain all software strings – and maybe even error messages, status messages, and the like. This will make the glossary unwieldy for the translators and unmanageable for the client.

Linguists generally use glossaries alongside a TM. For example, if a translator comes across a 75% match, but some terminology is different, they may find that terminology in a glossary. Without a TM, a glossary can be very useful especially for technical pieces. If the terminology is already translated, the linguist then only has to translate around these terms, making their job easier and faster. Making sure that the glossary and the TM are in concordance with each other (i.e., the TM does not contain different terminology from the glossary) is very important.

Similarities between TMs and glossaries

To be effective, both TMs and glossaries need to be managed. It is not sufficient to create a TM and think that it will never need to be touched again. Just like any database, glossaries and TMs need to be updated and cleaned as new terminology expressions emerge, or as languages themselves change.