Article Archives - The CJK Group

Pitfalls in eDiscovery Search Terms and the 10,000 Ways to Write “Kenji Watanabe”

Remu Ogaki, Esq., Senior Project Manager, The CJK Group

It may seem to many litigators that creating search terms across multiple languages is a simple matter of translation.

However, when trying to create search terms in a foreign language, there can be unexpected pitfalls that can cause headaches in the form of false hits, unintendedly narrow or broad searches, or missed documents.

In this article, we’ll focus on how the simple task of translating a name can contain unexpected pitfalls. Read the rest of the article at LLM Law Review here:

Rem’s World: Pitfalls in eDiscovery Search Terms and the 10,000 Ways to Write “Kenji Watanabe”

Rem’s World: Treating Document Review Attorneys with Dignity

Remu Ogaki, Esq., Senior Project Manager, The CJK Group

One of the most gratifying things I’ve heard repeatedly throughout my career in eDiscovery has been “I love working with you, you treat us like real people.”

The nature of eDiscovery, particularly in the frontlines of document review, is that the vast majority of the people work through hundreds of thousands of documents on a regular basis. It’s one of those so-called “thankless” jobs and we cut our teeth as attorneys hired on short term contracts.

Read the article here: Treating Document Review Attorneys with Dignity

It’s All Greek To Me–Lawyering & eDiscovery in Multi-cultural Times

Remu Ogaki, Esq., Senior Project Manager, The CJK Group

Intro

Many litigators underestimate the extent of the challenges they face when litigating a case where most of the evidence is in a language they cannot read. Simple tasks they do, often without even thinking, suddenly become near impossible without assistance. In this article, I want to zone in on one of these tasks—the compilation of evidence and creation of a timeline.

Many good litigators describe their approach to presenting facts as “telling a story.” Pieces of evidence are strung together to create a narrative that flows, resulting in a complete picture that the litigator wishes to tell.

This process, however, is complicated when the litigator cannot read the evidence themselves, such as in the case of foreign language document reviews.

Read the rest of the article here: It’s All Greek To Me–Lawyering & eDiscovery in Multi-cultural Times

The Pros and Cons of Conducting eDiscovery Doc Review in Japan

Remu Ogaki, Esq., Senior Project Manager, The CJK Group

The allure of conducting eDiscovery in Japan is easy to understand: cost. Hourly rates for Japanese language paralegals in Japan are significantly lower than their counterparts in America, and vastly lower than hiring a team of bilingual American attorneys.

From the perspective of risk management, it is important to understand what tradeoffs are involved in conducting a Japan-based review. Read the rest of the article here: The Pros and Cons of Conducting eDiscovery Doc Review in Japan

Rem’s World: Sherlock Holmes and eDiscovery?

Remu Ogaki, Esq., Senior Project Manager, The CJK Group

Litigation has some surprising similarities to a Sherlock Holmes novel. It’s not enough for Holmes to point a finger at Mr. X and say, you’re the murderer. Holmes must build an argument, step by step, by laying evidence to support each and every point he wants to make.

Mr. X was in the room, because of A, B, C.

Mr. X had the murder weapon because of D, E, F.

Mr. X had the motive to kill because of G, H, I.

Issue tags in eDiscovery basically help a litigator accomplish the Holmes role in a complex case.

Issue tags are electronic markers applied to a document to indicate for later use what legal issues such document relates to. This can be useful, because with a click of a button, an attorney can pull up all the financial documents that were in the data set, for example.

However, this information that the litigator wants can be at tension with another important factor: review speed, and by extension, review cost.

Pace refers to the documents per minute by which an eDiscovery contract worker reviews documents.

Since it is standard industry practice to conduct eDiscovery billed by the hour, for any given number of documents, the faster the reviewers are able to move through the documents, the lower the discovery costs in litigation. I’m putting aside, for now, the other model of doc review, which pricing review “per doc.” The per doc model will be explored on another installment of “Rem’s World.”

The Tradeoff

There is a tradeoff between “Information” and “pace.” The greater the number of issue tags and factual matters the reviewers are instructed to seek out and categorize, the slower the review will proceed.

I’ve personally seen instructions that request reviewers categorize documents into as many as 22 different types of issues. In practical terms, simply scrolling through 22 different issue items looking for the one you need for a particular document can be time consuming.

However, the biggest problem is the sheer number of criteria both to memorize, and to attempt to apply to each document. When there are 22 different possible ways a document can receive an issue tag, the complexity of analysis that must be applied to each and every document can be staggering.

Ideally, a reviewer is able to go through 40 or more documents an hour. This means, a reviewer can spend roughly 90 seconds reading through an email and making a determination on how it should be categorized, then applying that categorization through a series of clicks.

Consider how practical it is to expect someone to repeatedly and rapidly apply a 22-prong test to a document in 90 seconds over and over for hours.

Furthermore, there is an unfortunate but common misunderstanding that the “Number of clicks” a reviewer must undertake to finish coding a document determines the amount of time it takes to review a document. For example, if a reviewer must click 6-7 issue tags to responsive documents before moving to the next document, this indeed can add 5-10 seconds to the review process.

This is a mistake of correlation and causation. A more complex analysis that is applied to a document will require more clicks, and so a review that requires numerous clicks is likely to be complex, and therefore slower.

However, on a slower, complex review where the average rate of review is around 30 documents per hour, the average number of seconds per document is around 120 seconds. While it is true that having numerous clicks to make per document can add up over time, the impact of this is relatively minimal—the vast majority of the increased time per document (90 seconds à 120 seconds or more) must be coming from somewhere else.

That “somewhere else” is the increased complexity of analysis required. The greater the number of different categories of information has been asked for, the slower the review will go.

You may ask, in all practical considerations, aren’t these the same thing?

One unfortunate way in which I have seen this misunderstanding manifest in a real-life case is where the law firm believes that they can circumvent any slowdown in review by minimizing the number of issue tags… but leaving the full complexity of the information asked for.

The firm thought that by reducing clicks, they could still ask for the same amount of information and have a rapid pace, getting the best of both worlds.

The instructions reduced the number of issue tags for 5 headings, 5 types of tags.

However, each tag had 4-6 “subheadings” any of which could require the tag.

For example, 1 tag regarding “important communications” might require the tag if

Alex and Beth talk regarding “X” between 2008 and 2011

Charlie and Dina talk regarding “X” between 2009 and 2014

Elizabeth has any contact with Widget Company at any time

Finn speaks to anyone regarding “Y” before 2015

However, the problem is there are still 4 separate issues that must be searched for, each of which have different timelines, several of which have nothing to do with each other.

Furthermore, by collapsing the 4 tags into a single generic “important communications” tag, the reviewers are deprived of any reminders that help them remember the key information they are looking for.

For example, if the tags were labeled as:

Communications: Alex/Beth 2008-2011

Communications: Charlie/Dina 2009 – 2014

Communications: Elizabeth Widgets

Communications: Finn 2015

Each tag would provide a visual cue that reminds the reviewer of what is important as they look through thousands of electronic files.

A tag that collapses all the information into a generic tag name forces each reviewer to try to memorize voluminous information on what is being searched for—particularly if the information appears only rarely in the dataset, this can easily lead to forgotten information and missed information.

What this underscores is what is truly important is the “amount of information” that is requested from the reviewers. Requesting highly detailed and voluminous information from reviewers condensed into a small handful of tags will not achieve the same pace of review as a simple review. Such attempts may even be counterproductive.

So What Does this All Mean?

The reality of litigation is that at times, you really need a huge number of categories of information from your eDiscovery vendor. Complexity can be unavoidable.

However, one should also be aware that there is a very close relationship between the amount of “information” requested from the eDiscovery vendor and every effort should be made to limit the amount of issue tagging requested of the reviewers to an absolute minimum, or as simple as possible. What I described above is exponentially more challenging when your electronic data is in another language. The task of the Sherlock Holmes-type eDiscovery Project Manager is no easy feat, particularly when he/she is sleuthing for clues amid a sea of non-English gibberish!

Rem’s World: The Challenge of Homonyms (Part II)

Remu Ogaki, Esq., Senior Project Manager, The CJK Group

In Part I, we discussed how common homonyms can be in Japanese (over a third of all its words) and the many ways this can manifest in the language. I discussed this by way of an example using the Japanese ちょっと (Chotto) to illustrate.

The prevalence of homonyms leads to various possible complications when applied to eDiscovery.

For example, the most common is the way in which typos occur in Japanese.

In English, a typo is fairly easily discernible, based on context, since it involves simply a misplaced or omitted letter.

Recoggnize
Prblem
Maiintenance

An English speaker can easily identify what word was intended by the writer.

Japanese can be far more challenging, because of the way typing works. In Japanese, a person starts by typing out how the word sounds. Then, the computer pulls up a list of every homonym for that word, giving you a choice of which word to select, like this:

As I noted in Part I, as Japanese contains a huge number of homonyms, a single word can have dozens of possible homonyms—the word “Keiki” listed above has no fewer than 60 possible different choices, each with a different meaning.

The words can bear absolutely no relation to each other in meaning, for example 景気 (Keiki) means “economic conditions” while 刑期 means “length of sentence,” 計器 means “measuring equipment.”

What this means is that a “typo” in Japanese can result words that bear no relation to each other in meaning but can be inserted accidentally. This can be particularly prevalent in casual written conversation, such as text messaging. Imagine this in the context of an eDiscovery-based document review of Japanese! I’ve managed tens of thousands of hours of foreign language document review mostly with The CJK Group document, so I’ve literally seen it all.

To add to the confusion, a Japanese writer may misspell the word entirely, thus writing a word that sounds like the intended word. For example, if the person meant to write “Keiki” (Measuring equipment) they may accidentally type “Keika” and choose one of a dozen or more choices, thus to completely alter the intended meaning of the sentence.

As such, typos can add a great deal more confusion in interpretation for the reader in Japanese than a typo in English.

Homonyms & eDiscovery: What’s the Fuss All About?

This is a very common issue faced by eDiscovery reviewers when viewing emails and text messages in real world circumstances. Reading the sentence within the context of the conversation, reviewers will determine what homonym was most likely intended by the writer. Or, alternatively, if no homonym to the word makes any sense, the reviewer will try to figure out what kind of typo the writer did and try to identify what other similarly sounding word was intended.

But in rare circumstances, particularly in the context of criminal investigations, the possibility that an “odd sounding” typo may represent some other form of coded language may need to be considered as well.

This is one thing that makes applications of machine translation extremely difficult as applied to Japanese.

Due to the higher cost of eDiscovery review teams with expertise in foreign language the possibility of cutting costs by having an English language review team review documents translated from a foreign language through machine translation is often referenced.

Even the best machine learning AI will have difficulty correctly identifying a typo, much less identify the correct intended meaning based on context. Nor will it be able to flag a typo as “suspicious.”

Only a team of experts deeply versed in both the cultural and linguistic nature of a foreign language can handle these challenges with consistency. And particularly, in a language like Japanese, it can be all but essential.

Rem’s World: ちょっと “Chotto” Let’s Talk Ambiguity, Homonyms & eDiscovery (Part I)

Remu Ogaki, Esq., Senior Project Manager, The CJK Group

The Problem with Homonyms

In any language, there will be homonyms, a word pronounced identically, yet meaning multiple things.

For example:

Right (direction) vs. Right (correct)

Might (maybe) vs. might (power)

However, few words in the English language can have 7 different commonly used meanings, including polar opposite meanings.

In “Dueling Translations” I introduced the concept of multiple possible translations and how law firms should be aware of this when translating a legal document.

It is important to understand that some languages provide for a far wider variety of interpretations and require a greater degree of context than English, which in turn can make it particularly challenging to machine translation algorithms.

The Japanese word ちょっと (Chotto) is an example of such a word, meaning anything from “a little,” to “very,” expressing everything from irritation, intensity of feeling, large quantities to ambiguity.

For example, if a person yells out “Chotto, chotto” that could be translated as “excuse me!” as an expression of irritation.

If a person is asked how many lemons he has on his lemon tree, he might respond “chotto” meaning a few.

If a person is asked if they would like to play soccer on Saturday he might say “Saturday is Chotto,” meaning it’s not the best, i.e. an expression of inconvenience.

This is further complicated by the way in Japanese, one is culturally expected to emphasize the positive, and downplay the negative in language, but one is also expected to read between the lines to discern the true intent of the speaker.

For example, if we were to ask about the weather, and that person describes the weather as “chotto” cold, that would mean it’s a bit cold.

However, if someone were asked if the AC is on too high and if they are uncomfortable, if the person gave the exact same response of “chotto” cold, contextually it would be the equivalent of the person stating they are quite cold, but understating their discomfort out of politeness.

Since, culturally, one would expect a person who is only slightly uncomfortable to downplay this discomfort and state they are fine (but perhaps bump the thermostat up a degree or two), a person who goes so far as to describe their condition as “chotto” cold is actually stating they are quite freezing.

Further complicating this issue is the prevalence of Japanese homonyms on the language as a whole. For example, a linguistic paper presented at Nankai University in 2008 noted that over 1/3 (33.53%) of words in the Japanese common usage dictionary had homonyms.[1] And many homonyms have not just 2 or 3 different meanings, but sometimes ten or more.

For example “Kisha” has numerous homonyms, and one could craft a sentence like

Kisha no Kisha wa Kisha de Kisha shimashita

貴社の記者は汽車で帰社しました

Your company’s reporter returned to your headquarters by train.

Oftentimes, these homonyms are distinguished by using different kanji characters, which together are read as the same sound, but indicate different meaning by the written word.

貴社

記者

汽車

帰社

All the above is read as “Kisha” but the different kanji (Japanese word meaning Han Character originally derived from Chinese) used tells the reader what meaning is intended.

So Why This Matters in eDiscovery

In the context of eDiscovery, this can be further complicated by typos. For example, one very common problem is for a writer to accidentally use the wrong homonym or uses a word with many homonyms that creates tremendous ambiguity in how to interpret a given email.

Untangling these types of ambiguities, particularly in a language that relies on context, conjecture and numerous homonyms like Japanese, can create room for legal interpretation and argument. This further underscores the need for high quality eDiscovery experts when analyzing large quantities of foreign language eDiscovery evidence.

These problems will be explored more fully in The Challenge of Homonyms: Part II!

[1] http://paper.lib.nankai.edu.cn/openfile?dbid=72&objid=48_53_56_56_52&flag=free

Rem’s World: ALTA Test, Blind Faith & eDiscovery

Remu Ogaki, Esq., Senior Project Manager, The CJK Group

Prior to The CJK Group, I had the opportunity to work for a variety of different agencies on Japanese eDiscovery Projects as a contract employee.

One thing that was shockingly common was to walk into a room, turn to the person next to me and say, “Hi, nice to meet you” in Japanese, and for the person to respond “Oh, I can’t actually speak in Japanese.” This was a Japanese review and the eDiscovery managed review provider was retained by a major US law firm to provide high quality Japanese document review. To be frank, this is something that occurs in virtually all foreign language reviews. Trust me, I know. I’ve been involved in tens of thousands of billable hours of foreign language review. The stories I can tell…

I would often think, how is this possible, let alone common?

A major factor is blind reliance on the ALTA test to determine linguistic competence among reviewers assigned to non-English projects. I’ll cover some other reasons for this but let’s dive into the ALTA test for now.

ALTA is a language proficiency testing service (ALTA Language Services, Inc.) that can test for various proficiency levels and is used to test language qualification for bilingual customer service representatives, nurse practitioners, law enforcement officers, sales representatives, flight attendants, personal bankers, physicians, government linguists, and legal document reviewers.

The ALTA test is a short multiple choice language test that can be completed in under an hour by most test takers. It requires that the test taker read a short passage in the target language, then answer a series of multiple-choice questions designed to test comprehension of that short passage.

For purposes of testing for competence in foreign language review, this format presents several problems.

First, the time requirements of reading the passage are quite generous. A reviewer who can read a short email but takes 4-5 minutes (instead of 30 seconds) to read it is not qualified to conduct review.

Second, as the test is not proctored and administered largely in an “honor system” environment, there are ways for the unscrupulous to engage in cooperative test taking, sharing of answers, etc. The incentive in foreign language pay, which is typically double or triple the rate of English document review, provides the motivation. Think $30 USD versus $95 USD. If you can get on enough of these reviews, you could make more than the Associate at Big Law.

This is not to say the ALTA test is not of value—it can provide a valuable gatekeeper role in excluding from considerations individuals who are clearly unqualified to conduct a foreign language eDiscovery reviewer role.

However, the ALTA test cannot take the place of a bilingual manager who can evaluate the work provided by the eDiscovery reviewers on an ongoing basis. To place blind faith in the ALTA test results as a marker of linguistic competence can be dangerous.

To make matters worse, the typical review model assigns an English-only Project Manager to manage a foreign language review. For obvious reasons, these managers cannot personally evaluate the quality of the work of a foreign language reviewer.

These managers can only evaluate the quality of the work of their team members by reliance on someone that speaks the language—but they have no personal ability to evaluate substantive review-based decisions. They do not speak or read the language and have no independent mechanism to examine the original electronic file in its native language.

While the manager can choose someone on the team to quality control the rest of the team and report back, the manager is incapable of personally verifying whether that person is competent.

I once worked on a team where the English-speaking manager had placed a person in charge of overseeing the rest of the team who clearly did not speak or know Japanese very well. Since the Project Manager spoke no Japanese, it was months before this error in personnel was discovered, requiring rechecking tens of thousands of documents at major client expense.

The ALTA test is a useful first step in evaluating the linguistic competency of an eDiscovery team member, but it can be a high-risk decision to rely upon it blindly. Be very careful and ask many questions if your vendor has only an ALTA test. It takes much more than an ALTA test to determine foreign language review aptitude.

Rem’s World: Dueling Translations

Remu Ogaki, Esq., Senior Project Manager, The CJK Group

During litigation involving foreign languages, there’s a common assumption that translation is simply matter of “correctly” changing the language from A to B. Especially in a legal context, this is an extremely unfortunate misunderstanding.

It can be very common that there are multiple potential translations of a given phrase, sometimes with polar opposite meanings that can radically influence the outcome of a case.

In other words, there can be room for debate in deciding what is the “correct” translation, and if there are multiple valid translations, you need a support team there to help you press for the most advantageous interpretation.

In Japanese, much more so than in English, the rules of grammar permit you to omit the subject, verb OR object of the sentence and leave it implicit. This can make sentences extremely ambiguous, allowing for multiple interpretation.

For example, this single Japanese phrase can be interpreted to mean opposite things:

Example 1

悪い事、教えてあげれば大親友

Interpretation 1:

Someone who teaches you [the fun of] the wrong thing is the best of friends

i.e. “You can become best friends with someone who’ll do bad things with you,”

Interpretation 2:

Someone who teaches you [what is] wrong is the best of friends

i.e. “someone who teaches you right from wrong can become best friends with you”

The bold text is the part which does not appear explicitly in the original sentence, but either of the translations (without context) would be a “correct” translation.

Example 2

人の嫌がる事を進んでする人になりましょう

Interpretation 1

Become the person who steps forwards to do the thing nobody wants to do

i.e. do what other people won’t want to do

Interpretation 2

Become the person who steps forwards to do the thing nobody wants [you] to do

i.e. do what makes other people uncomfortable

Certainly, not every sentence provides room for radically different interpretation, and it can also often be the case that the context of the conversation leaves only 1 valid interpretation of the sentence.

However, what these examples demonstrate is the possibility that what may otherwise look innocuous in one possible interpretation could be extremely damning or highly legally significant with a different interpretation.

It’s important to be able to rely on your foreign language eDiscovery experts to alert you to any such issues in a real-world case.

Rem’s World: eDiscovery & Seaweed

Remu Ogaki, Esq., Senior Project Manager, The CJK Group

What does it mean when a Japanese manager tells his subordinate to treat an official to “wakame (seaweed)”?

In cross border legal matters, whom you choose as your multilingual eDiscovery partner for analysis of evidence can make the difference between finding the smoking gun and missing it entirely.

Many people are under the misconception that choosing such a vendor is simply a matter of finding any attorney that speak the right language. This is akin to saying, “find me any attorney who ever tried a personal injury case, it doesn’t matter who.”

For example, a quick google search for “Japanese Document Reviewer” will reveal dozens of job posts sourcing candidates based on the qualification of “speaks Japanese” and “bar admittance.”

Where this attitude can get you in trouble is where evidence can be unexpectedly complicated. My friends, litigation is often complicated. Facts are not always clear. Add to this Japanese, well, it’s complicated.

Let’s return to the example at the top: What does it mean when a Japanese manager tells his subordinate to treat an official to “wakame (seaweed)”?

Say we’re dealing with an FCPA issue, a bribery investigation. You want to find any instance where inappropriate benefits were offered to entice government officials to do your client’s bidding.

If you hire a random “Joe” off the street who took Japanese in college, he might proudly tell you “Wakame” is a type of edible seaweed, so he is probably telling his subordinate to take the official to a restaurant that specializes in seaweed.

You might think to yourself, that sound reasonable, toss in a pile of mildly concerning relevant documents, and forget about it. You move on. This issue just flies below the radar and nobody flags it for further analysis.

What should you get when you hire a team of experienced Japanese speaking attorneys with years of insight analyzing Japanese electronic evidence and a deep cultural understanding of Japanese coupled with strong legal analysis?

Let me tell you…

Wakame (seaweed) can be a slang reference for a woman’s pubic hair. “Treat the official to wakame” may be saying in actuality, “treat the official to a prostitute.”

This is an actual example from a real-life investigation conducted by myself and my team. Further investigation uncovered disguised receipts from brothels and other types of inappropriate dealings. A risk management approach to foreign language document review would view many inadequacies in the prevailing model of non-English document review. Ensuring you have the right person managing the review and the appropriate controls in place to identify potential vulnerabilities is critical. This is the first step that any review is built upon.

Stay tuned for more multi-lingual CJK stories from the eDiscovery frontlines in a series we call “Rem’s World!”