Remu Ogaki, Esq., Senior Project Manager, The CJK Group
The Problem with Homonyms
In any language, there will be homonyms, a word pronounced identically, yet meaning multiple things.
For example:
Right (direction) vs. Right (correct)
Might (maybe) vs. might (power)
However, few words in the English language can have 7 different commonly used meanings, including polar opposite meanings.
In “Dueling Translations” I introduced the concept of multiple possible translations and how law firms should be aware of this when translating a legal document.
It is important to understand that some languages provide for a far wider variety of interpretations and require a greater degree of context than English, which in turn can make it particularly challenging to machine translation algorithms.
The Japanese word ちょっと (Chotto) is an example of such a word, meaning anything from “a little,” to “very,” expressing everything from irritation, intensity of feeling, large quantities to ambiguity.
For example, if a person yells out “Chotto, chotto” that could be translated as “excuse me!” as an expression of irritation.
If a person is asked how many lemons he has on his lemon tree, he might respond “chotto” meaning a few.
If a person is asked if they would like to play soccer on Saturday he might say “Saturday is Chotto,” meaning it’s not the best, i.e. an expression of inconvenience.
This is further complicated by the way in Japanese, one is culturally expected to emphasize the positive, and downplay the negative in language, but one is also expected to read between the lines to discern the true intent of the speaker.
For example, if we were to ask about the weather, and that person describes the weather as “chotto” cold, that would mean it’s a bit cold.
However, if someone were asked if the AC is on too high and if they are uncomfortable, if the person gave the exact same response of “chotto” cold, contextually it would be the equivalent of the person stating they are quite cold, but understating their discomfort out of politeness.
Since, culturally, one would expect a person who is only slightly uncomfortable to downplay this discomfort and state they are fine (but perhaps bump the thermostat up a degree or two), a person who goes so far as to describe their condition as “chotto” cold is actually stating they are quite freezing.
Further complicating this issue is the prevalence of Japanese homonyms on the language as a whole. For example, a linguistic paper presented at Nankai University in 2008 noted that over 1/3 (33.53%) of words in the Japanese common usage dictionary had homonyms.[1] And many homonyms have not just 2 or 3 different meanings, but sometimes ten or more.
For example “Kisha” has numerous homonyms, and one could craft a sentence like
Kisha no Kisha wa Kisha de Kisha shimashita
貴社の記者は汽車で帰社しました
Your company’s reporter returned to your headquarters by train.
Oftentimes, these homonyms are distinguished by using different kanji characters, which together are read as the same sound, but indicate different meaning by the written word.
貴社
記者
汽車
帰社
All the above is read as “Kisha” but the different kanji (Japanese word meaning Han Character originally derived from Chinese) used tells the reader what meaning is intended.
So Why This Matters in eDiscovery
In the context of eDiscovery, this can be further complicated by typos. For example, one very common problem is for a writer to accidentally use the wrong homonym or uses a word with many homonyms that creates tremendous ambiguity in how to interpret a given email.
Untangling these types of ambiguities, particularly in a language that relies on context, conjecture and numerous homonyms like Japanese, can create room for legal interpretation and argument. This further underscores the need for high quality eDiscovery experts when analyzing large quantities of foreign language eDiscovery evidence.
These problems will be explored more fully in The Challenge of Homonyms: Part II!
[1] http://paper.lib.nankai.edu.cn/openfile?dbid=72&objid=48_53_56_56_52&flag=free